[
https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841965#comment-13841965
]
Szehon Ho commented on HIVE-3245:
---------------------------------
I created the table as described in the JIRA and ran select * both from beeline
and my own java program embedding the JDBC driver. In both instances, the
Japanese characters displayed correctly:
0: jdbc:hive2://localhost:10000> select * from japan_j;
+-------+------------------------------------------------+------+
| rnum | c1 | ord |
+-------+------------------------------------------------+------+
| 11 | (1)インデックス | 36 |
| 12 | <5>Switches | 37 |
| 10 | 400ranku | 39 |
| 9 | 666Sink | 40 |
| 14 | P-Cabels | 35 |
| 13 | R-Bench | 38 |
| 27 | エコー | 34 |
| 26 | エチャント | 24 |
| 25 | ガード | 4 |
| 28 | コート | 3 |
| 29 | ゴム | 1 |
| 41 | ざぶと | 2 |
| 40 | さんしょう | 6 |
| 31 | ズボン | 5 |
| 30 | スワップ | 41 |
| 37 | せっけい | 42 |
| 36 | せんたくざい | 46 |
| 32 | ダイエル | 45 |
| 39 | はっぽ | 43 |
| 38 | はつ剤 | 44 |
| 34 | ファイル | 48 |
| 33 | フィルター | 50 |
| 35 | フッコク | 49 |
| 8 | 「2」計画 | 47 |
| 46 | 暗視 | 9 |
| 45 | 音楽 | 8 |
| 47 | 音声認識 | 7 |
| 44 | 記載 | 10 |
| 43 | 記録機 | 11 |
| 42 | 高機能 | 15 |
| 50 | 国家利益 | 14 |
| 48 | 国立公園 | 18 |
| 49 | 国立大学 | 22 |
| 7 | ⑤号線路 | 21 |
| 5 | (Ⅰ)番号列 | 23 |
| 1 | 356CAL | 17 |
| 2 | 980Series | 16 |
| 6 | <ⅸ>Pattern | 20 |
| 3 | PVDF | 19 |
| 4 | ROMAN-8 | 13 |
| 15 | アンカー | 12 |
| 16 | エンジン | 30 |
| 19 | カットマシン | 29 |
| 20 | カード | 28 |
| 18 | コーラ | 26 |
| 17 | ゴールド | 25 |
| 24 | サイフ | 27 |
| 21 | ツーウィング | 32 |
| 23 | フォルダー | 33 |
| 22 | マンボ | 31 |
+-------+------------------------------------------------+------+
I tested with the new JDBCDriver (org.apache.hive.jdbc.HiveDriver) against
HiveServer2.
The platform running Beeline should be set to utf8 ("echo $LANG"), or any other
java application using JDBC driver should have be started with utf-8 JVM args
("java -Dfile.encoding=UTF-8"). That should already be a requirement for
client's wishing to display utf-8 characters.
The code that Mark Grover mentioned does not apply anymore, as new JDBCDriver
gets results from HiveServer directly via ThriftString field, and does not do
another round of serialization/deserialization on client side, where it is said
the error occurred. So in my opinion, the issue can be closed for Hive driver.
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
> Key: HIVE-3245
> URL: https://issues.apache.org/jira/browse/HIVE-3245
> Project: Hive
> Issue Type: Bug
> Components: JDBC
> Affects Versions: 0.8.0
> Reporter: N Campbell
> Assignee: Szehon Ho
> Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string
> columns via tab delimited text files. A simple projection of the columns in
> the table is not displaying the correct data. Exporting the data from Hive
> and looking at the files implies the data is loaded properly. it appears to
> be an encoding issue at the driver but unaware of any required URL connection
> properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP select * from CERT.TLJA_JP_E;
--
This message was sent by Atlassian JIRA
(v6.1#6144)