[ 
https://issues.apache.org/jira/browse/HIVE-3363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731825#comment-13731825
 ] 

Kousuke Saruta commented on HIVE-3363:
--------------------------------------

I think, this problem may be similar to HIVE-2137.
In SQLOperation.java, getNextRowSet() have a bunch of code as follows,
{code}

      for (String rowString : rows) {
        rowObj = serde.deserialize(new BytesWritable(rowString.getBytes()));
        for (int i = 0; i < fieldRefs.size(); i++) {
          StructField fieldRef = fieldRefs.get(i);
          fieldOI = fieldRef.getFieldObjectInspector();
          deserializedFields[i] = 
convertLazyToJava(soi.getStructFieldData(rowObj, fieldRef), fieldOI);
        }
        rowSet.addRow(resultSchema, deserializedFields);
      }
{code}

The code above use getBytes() without setting encoding so it will use system 
default encoding.
If the front end of hive is used in Windows, encoding mismatch will happen 
because Hive(Hadoop) expects UTF-8 for their character encoding but Windows use 
Shift_JIS.
So, I think the code above should be as follows

{code}

      for (String rowString : rows) {
        rowObj = serde.deserialize(new 
BytesWritable(rowString.getBytes("UTF-8")));
        for (int i = 0; i < fieldRefs.size(); i++) {
          StructField fieldRef = fieldRefs.get(i);
          fieldOI = fieldRef.getFieldObjectInspector();
          deserializedFields[i] = 
convertLazyToJava(soi.getStructFieldData(rowObj, fieldRef), fieldOI);
        }
        rowSet.addRow(resultSchema, deserializedFields);
      }
{code}
                
> Special characters (such as 'é') displayed as '?' in Hive
> ---------------------------------------------------------
>
>                 Key: HIVE-3363
>                 URL: https://issues.apache.org/jira/browse/HIVE-3363
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Anand Balaraman
>
> I am facing an issue while viewing special characters (such as é) using Hive.
> If I view the file in HDFS (using hadoop fs -cat command), it is displayed 
> correctly as ’é’, but when I select the data using Hive, this character alone 
> gets replaced by a question mark.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to