[ https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841965#comment-13841965 ]
Szehon Ho commented on HIVE-3245: --------------------------------- I created the table as described in the JIRA and ran select * both from beeline and my own java program embedding the JDBC driver. In both instances, the Japanese characters displayed correctly: 0: jdbc:hive2://localhost:10000> select * from japan_j; +-------+------------------------------------------------+------+ | rnum | c1 | ord | +-------+------------------------------------------------+------+ | 11 | (1)インデックス | 36 | | 12 | <5>Switches | 37 | | 10 | 400ranku | 39 | | 9 | 666Sink | 40 | | 14 | P-Cabels | 35 | | 13 | R-Bench | 38 | | 27 | エコー | 34 | | 26 | エチャント | 24 | | 25 | ガード | 4 | | 28 | コート | 3 | | 29 | ゴム | 1 | | 41 | ざぶと | 2 | | 40 | さんしょう | 6 | | 31 | ズボン | 5 | | 30 | スワップ | 41 | | 37 | せっけい | 42 | | 36 | せんたくざい | 46 | | 32 | ダイエル | 45 | | 39 | はっぽ | 43 | | 38 | はつ剤 | 44 | | 34 | ファイル | 48 | | 33 | フィルター | 50 | | 35 | フッコク | 49 | | 8 | 「2」計画 | 47 | | 46 | 暗視 | 9 | | 45 | 音楽 | 8 | | 47 | 音声認識 | 7 | | 44 | 記載 | 10 | | 43 | 記録機 | 11 | | 42 | 高機能 | 15 | | 50 | 国家利益 | 14 | | 48 | 国立公園 | 18 | | 49 | 国立大学 | 22 | | 7 | ⑤号線路 | 21 | | 5 | (Ⅰ)番号列 | 23 | | 1 | 356CAL | 17 | | 2 | 980Series | 16 | | 6 | <ⅸ>Pattern | 20 | | 3 | PVDF | 19 | | 4 | ROMAN-8 | 13 | | 15 | アンカー | 12 | | 16 | エンジン | 30 | | 19 | カットマシン | 29 | | 20 | カード | 28 | | 18 | コーラ | 26 | | 17 | ゴールド | 25 | | 24 | サイフ | 27 | | 21 | ツーウィング | 32 | | 23 | フォルダー | 33 | | 22 | マンボ | 31 | +-------+------------------------------------------------+------+ I tested with the new JDBCDriver (org.apache.hive.jdbc.HiveDriver) against HiveServer2. The platform running Beeline should be set to utf8 ("echo $LANG"), or any other java application using JDBC driver should have be started with utf-8 JVM args ("java -Dfile.encoding=UTF-8"). That should already be a requirement for client's wishing to display utf-8 characters. The code that Mark Grover mentioned does not apply anymore, as new JDBCDriver gets results from HiveServer directly via ThriftString field, and does not do another round of serialization/deserialization on client side, where it is said the error occurred. So in my opinion, the issue can be closed for Hive driver. > UTF encoded data not displayed correctly by Hive driver > ------------------------------------------------------- > > Key: HIVE-3245 > URL: https://issues.apache.org/jira/browse/HIVE-3245 > Project: Hive > Issue Type: Bug > Components: JDBC > Affects Versions: 0.8.0 > Reporter: N Campbell > Assignee: Szehon Ho > Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, CERT.TLJA.txt > > > various foreign language data (i.e. japanese, thai etc) is loaded into string > columns via tab delimited text files. A simple projection of the columns in > the table is not displaying the correct data. Exporting the data from Hive > and looking at the files implies the data is loaded properly. it appears to > be an encoding issue at the driver but unaware of any required URL connection > properties re encoding that Hive JDBC requires. > create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int) > row format delimited > fields terminated by '\t' > stored as textfile; > create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int) > stored as sequencefile; > load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt' > overwrite into table CERT.TLJA_JP_E; > insert overwrite table CERT.TLJA_JP select * from CERT.TLJA_JP_E; -- This message was sent by Atlassian JIRA (v6.1#6144)