[ https://issues.apache.org/jira/browse/HIVE-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193589#comment-14193589 ]
Xiaobing Zhou commented on HIVE-7511: ------------------------------------- This can be resolved by applying java options, like -Dfile.encoding=UTF-8. Setting it as env variable(_JAVA_OPTIONS=-Dfile.encoding=UTF-8) or passing as java start argument both work fine. > Hive: output is incorrect if there are UTF-8 characters in where clause of a > hive select query. > ----------------------------------------------------------------------------------------------- > > Key: HIVE-7511 > URL: https://issues.apache.org/jira/browse/HIVE-7511 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Environment: Windows Server 2008 R2 > Reporter: Xiaobing Zhou > Assignee: Xiaobing Zhou > Priority: Critical > Attachments: HIVE-7511.1.patch > > > When we put UTF-8 characters in where clause of a hive query the results are > empty for "where content like '%丄%'" and results contain all rows for "where > content not like '%丄%';" even when few rows contain this character. > Steps to reproduce: > 1. Save a file called data.txt in the root container. The contents of the > files are as follows. > 190 丄f齄啊c狛䶴h䶴c狝 > 899 d狜狜㐁geg阿狚ea䶴eead狜e > 137 齄鼾h狝ge㐀狛g狚阿 > 21 﨩﨩e㐀c狛鼾d䶴﨨 > 767 﨩c﨩g狜㐁狜狛齄阿﨩狚齄﨨䶵狝﨨 > 281 﨨㐀啊aga啊c狝e鼾鼾 > 573 㐁䶴hc﨨b狝㐁﨩䶴狜丄hc齄 > 966 䶴丄狜﨨e狝eb狜㐁c㐀鼾﨩丄ga狚丄 > 565 䶵㐀﨩㐀bb狛ehd丄ea丄㐀 > 778 﨩㐁阿﨨狚bbea丄䶵丄狚鼾狚a䶵 > 363 gd齄a鼾a䶴b㐁㐁fg鼾 > 822 a阿狜䶵h䶵e狛h﨩gac狜阿㐀啊b > 338 b齄㐁ff阿e狜e㐀ba齄 > 2. Execute the following queries to setup the table. > a. CREATE TABLE hivetable(row INT, content STRING) ROW FORMAT DELIMITED > FIELDS TERMINATED BY ' > t' LOCATION '/hivetable'; > b. LOAD DATA INPATH 'wasb:///data.txt' OVERWRITE INTO TABLE hivetable; > 3. create a query file query.hql with following contents > INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' > select * from hivetable where content like '%丄%'; > 4. even though few rows contains this character the output is empty. > 5. change the contents of query.hql to > INSERT OVERWRITE DIRECTORY 'wasb:///hiveoutput' > select * from hivetable where content not like '%丄%'; > 6. The output contains all rows including those containing the given > character. > 7. Similar results are observed when using "where content = '丄f齄啊c狛䶴h䶴c狝'; " > 8. We get expected results when using "where content like '%a%'; " -- This message was sent by Atlassian JIRA (v6.3.4#6332)