[
https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445297#comment-13445297
]
Mark Grover commented on HIVE-3245:
-----------------------------------
I got into some trouble with the JDBC driver on Hive 0.7.1 as well. I did some
poking around too but couldn't spend much time. While doing so, I got to
org.apache.hadoop.hive.jdbc.HiveQueryResultSet class. Inside next(), the code
has:
{code:title=HiveQueryResultSet.java|borderStyle=solid}
Object data = serde.deserialize(new BytesWritable(rowStr.getBytes()));
{code}
Now, getBytes() comes in two variants, one that takes no parameters and uses
the default encoding (like in the above row) or one that explicitly takes the
encoding as parameter. I have a hunch that this could be a problem and that the
encoding should be sent as a parameter. However, I haven't gotten the chance to
verify/refute my hunch.
> UTF encoded data not displayed correctly by Hive driver
> -------------------------------------------------------
>
> Key: HIVE-3245
> URL: https://issues.apache.org/jira/browse/HIVE-3245
> Project: Hive
> Issue Type: Bug
> Components: JDBC
> Affects Versions: 0.8.0
> Reporter: N Campbell
> Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string
> columns via tab delimited text files. A simple projection of the columns in
> the table is not displaying the correct data. Exporting the data from Hive
> and looking at the files implies the data is loaded properly. it appears to
> be an encoding issue at the driver but unaware of any required URL connection
> properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP select * from CERT.TLJA_JP_E;
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira