[jira] [Created] (HIVE-3677) Encoding Issue - ISO-8859-1

Sergio Kameoka (JIRA) Tue, 06 Nov 2012 10:28:14 -0800

Sergio Kameoka created HIVE-3677:
------------------------------------

             Summary: Encoding Issue - ISO-8859-1
                 Key: HIVE-3677
                 URL: https://issues.apache.org/jira/browse/HIVE-3677
             Project: Hive
          Issue Type: Bug
          Components: Configuration, Import/Export
    Affects Versions: 0.8.1
         Environment: Amazon EMR with Hive (Hive 0.8.1 and haddop 1.0.3)
            Reporter: Sergio Kameoka
             Fix For: 0.8.1



We’ve created a very simple example using Amazon EMR with Hive which is 
basically create a single table with Hive and load some data inside this table. 
Below you’ll find the code that has been used:

//CREATE TABLE CODE

CREATE TABLE sampletable (
valorstring STRING, valordecimal DOUBLE)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
WITH SERDEPROPERTIES (
'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
'quote.delim'='("|\\[|\\])',
'field.delim'=' ',
'serialization.null.format'='-')
STORED AS TEXTFILE;

 
//LOAD DATA CODE
LOAD DATA LOCAL INPATH '/tmp/sampletable.txt' OVERWRITE INTO TABLE sampletable;

Here is the text file content that we are using to load the data:

/tmp/sampletable.txt
"Exemplo de texto com acentuação" 90,15
"Exemplo de texto com acentuação" 80.15

The problem that we are facing seems to be with the enconding that is been used 
in Hive configuration. Seems to me that it is been used UTF-8 but for Brazilian 
format we’ll need to use ISO-8859-1.

In the example above, when the data is loaded inside the table and we perform a 
simple select (Select * from sampletable) the text with accentuation is 
returned totally wrong and the double value with comma is returned as null.
We’ve already changed the variable LANG in enviroment and Hive variables with 
SET, but it doesn’t work so far.

Thank you in advance!!!



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3677) Encoding Issue - ISO-8859-1

Reply via email to