[jira] [Commented] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
[ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193589#comment-14193589 ] Xiaobing Zhou commented on HIVE-7511: - This can be resolved by applying java options, like -Dfile.encoding=UTF-8. Setting it as env variable(_JAVA_OPTIONS=-Dfile.encoding=UTF-8) or passing as java start argument both work fine. Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query. --- Key: HIVE-7511 URL: https://issues.apache.org/jira/browse/HIVE-7511 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Environment: Windows Server 2008 R2 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Priority: Critical Attachments: HIVE-7511.1.patch When we put UTF-8 characters in where clause of a hive query the results are empty for where content like '%丄%' and results contain all rows for where content not like '%丄%'; even when few rows contain this character. Steps to reproduce: 1. Save a file called data.txt in the root container. The contents of the files are as follows. 190 丄f齄啊c狛䶴h䶴c狝 899 d狜狜㐁geg阿狚ea䶴eead狜e 137 齄鼾h狝ge㐀狛g狚阿 21﨩﨩e㐀c狛鼾d䶴﨨 767 﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨 281 﨨㐀啊aga啊c狝e鼾鼾 573 㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄 966 䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄 565 䶵㐀﨩㐀bb狛ehd丄ea丄㐀 778 﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵 363 gd齄a鼾a䶴b㐁㐁fg鼾 822 a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b 338 b齄㐁ff阿e狜e㐀ba齄 2. Execute the following queries to setup the table. a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' t' LOCATION '/hivetable'; b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable; 3. create a query file query.hql with following contents INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content like '%丄%'; 4. even though few rows contains this character the output is empty. 5. change the contents of query.hql to INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content not like '%丄%'; 6. The output contains all rows including those containing the given character. 7. Similar results are observed when using where content = '丄f齄啊c狛䶴h䶴c狝'; 8. We get expected results when using where content like '%a%'; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7511) Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query.
[ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073908#comment-14073908 ] Navis commented on HIVE-7511: - [~xiaobingo] Closed means the patch is applied to trunk, which is not. Just use Submit Patch button below the summary. And for the patch, I think we should accept encoding type for the script file from user or hive-site.xml, not enforcing to use UTF-8. Hive: output is incorrect if there are UTF-8 characters in where clause of a hive select query. --- Key: HIVE-7511 URL: https://issues.apache.org/jira/browse/HIVE-7511 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Environment: Windows Server 2008 R2 Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou Priority: Critical Fix For: 0.14.0 Attachments: HIVE-7511.1.patch When we put UTF-8 characters in where clause of a hive query the results are empty for where content like '%丄%' and results contain all rows for where content not like '%丄%'; even when few rows contain this character. Steps to reproduce: 1. Save a file called data.txt in the root container. The contents of the files are as follows. 190 丄f齄啊c狛䶴h䶴c狝 899 d狜狜㐁geg阿狚ea䶴eead狜e 137 齄鼾h狝ge㐀狛g狚阿 21﨩﨩e㐀c狛鼾d䶴﨨 767 﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨 281 﨨㐀啊aga啊c狝e鼾鼾 573 㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄 966 䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄 565 䶵㐀﨩㐀bb狛ehd丄ea丄㐀 778 﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵 363 gd齄a鼾a䶴b㐁㐁fg鼾 822 a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b 338 b齄㐁ff阿e狜e㐀ba齄 2. Execute the following queries to setup the table. a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' t' LOCATION '/hivetable'; b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable; 3. create a query file query.hql with following contents INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content like '%丄%'; 4. even though few rows contains this character the output is empty. 5. change the contents of query.hql to INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' select * from hivetable where content not like '%丄%'; 6. The output contains all rows including those containing the given character. 7. Similar results are observed when using where content = '丄f齄啊c狛䶴h䶴c狝'; 8. We get expected results when using where content like '%a%'; -- This message was sent by Atlassian JIRA (v6.2#6252)