[jira] [Commented] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.

2014-11-01 Thread Xiaobing Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193589#comment-14193589
 ] 

Xiaobing Zhou commented on HIVE-7511:
-

This can be resolved by applying java options, like -Dfile.encoding=UTF-8. 
Setting it as env variable(_JAVA_OPTIONS=-Dfile.encoding=UTF-8) or passing as 
java start argument both work fine.

 Hive: output is incorrect if there are UTF-8 characters in where clause of a 
 hive select query.
 ---

 Key: HIVE-7511
 URL: https://issues.apache.org/jira/browse/HIVE-7511
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
 Environment: Windows Server 2008 R2
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
Priority: Critical
 Attachments: HIVE-7511.1.patch


 When we put UTF-8 characters in where clause of a hive query the results are 
 empty for where content like '%丄%' and results contain all rows for where 
 content not like '%丄%'; even when few rows contain this character.
 Steps to reproduce:
 1. Save a file called data.txt in the root container. The contents of the 
 files are as follows.
 190   丄f齄啊c狛䶴h䶴c狝
 899   d狜狜㐁geg阿狚ea䶴eead狜e
 137   齄鼾h狝ge㐀狛g狚阿
 21﨩﨩e㐀c狛鼾d䶴﨨
 767   﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
 281   﨨㐀啊aga啊c狝e鼾鼾
 573   㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
 966   䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
 565   䶵㐀﨩㐀bb狛ehd丄ea丄㐀
 778   﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
 363   gd齄a鼾a䶴b㐁㐁fg鼾
 822   a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
 338   b齄㐁ff阿e狜e㐀ba齄
 2. Execute the following queries to setup the table.
 a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '
 t' LOCATION '/hivetable';
 b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
 3. create a query file query.hql with following contents
 INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
 select * from hivetable where content like '%丄%';
 4. even though few rows contains this character the output is empty.
 5. change the contents of query.hql to 
 INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
 select * from hivetable where content not like '%丄%';
 6. The output contains all rows including those containing the given 
 character.
 7. Similar results are observed when using where content = '丄f齄啊c狛䶴h䶴c狝'; 
 8. We get expected results when using where content like '%a%'; 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.

2014-07-24 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073908#comment-14073908
 ] 

Navis commented on HIVE-7511:
-

[~xiaobingo] Closed means the patch is applied to trunk, which is not. Just 
use Submit Patch button below the summary. And for the patch, I think we 
should accept encoding type for the script file from user or hive-site.xml, not 
enforcing to use UTF-8.

 Hive: output is incorrect if there are UTF-8 characters in where clause of a 
 hive select query.
 ---

 Key: HIVE-7511
 URL: https://issues.apache.org/jira/browse/HIVE-7511
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
 Environment: Windows Server 2008 R2
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
Priority: Critical
 Fix For: 0.14.0

 Attachments: HIVE-7511.1.patch


 When we put UTF-8 characters in where clause of a hive query the results are 
 empty for where content like '%丄%' and results contain all rows for where 
 content not like '%丄%'; even when few rows contain this character.
 Steps to reproduce:
 1. Save a file called data.txt in the root container. The contents of the 
 files are as follows.
 190   丄f齄啊c狛䶴h䶴c狝
 899   d狜狜㐁geg阿狚ea䶴eead狜e
 137   齄鼾h狝ge㐀狛g狚阿
 21﨩﨩e㐀c狛鼾d䶴﨨
 767   﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨
 281   﨨㐀啊aga啊c狝e鼾鼾
 573   㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄
 966   䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄
 565   䶵㐀﨩㐀bb狛ehd丄ea丄㐀
 778   﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵
 363   gd齄a鼾a䶴b㐁㐁fg鼾
 822   a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b
 338   b齄㐁ff阿e狜e㐀ba齄
 2. Execute the following queries to setup the table.
 a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '
 t' LOCATION '/hivetable';
 b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable;
 3. create a query file query.hql with following contents
 INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
 select * from hivetable where content like '%丄%';
 4. even though few rows contains this character the output is empty.
 5. change the contents of query.hql to 
 INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput'
 select * from hivetable where content not like '%丄%';
 6. The output contains all rows including those containing the given 
 character.
 7. Similar results are observed when using where content = '丄f齄啊c狛䶴h䶴c狝'; 
 8. We get expected results when using where content like '%a%'; 



--
This message was sent by Atlassian JIRA
(v6.2#6252)