[ 
https://issues.apache.org/jira/browse/HIVE-7777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200053#comment-14200053
 ] 

Alon Goldshuv commented on HIVE-7777:
-------------------------------------

While the serde works fine, it has an issue, which is quite serious IMO - It 
forces all the column types to String. This means that running a query on data 
that isn't all string type can return wrong query results. In the unit tests I 
see a single example of a table using all string columns, and in the tests 
linked here there are many tables with non-string types, but all the queries 
seem to be simple COUNT(*), which won't catch the problem.

Consider the following example:

{noformat}
CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
STORED AS TEXTFILE 
LOCATION '<some location>' 
tblproperties ("skip.header.line.count"="1");
{noformat}

Now consider this sql:

hive> select min(totalprice) from test;

in this case given my data, the result should have been 874.89, but the actual 
result became 100001.57 (as it is first according to byte ordering of a string 
type). this is a wrong result.

hive> desc extended test;
OK
o_totalprice            string                  from deserializer
...

I apologize if it's a false alarm and I'm misusing the DDL somehow. Otherwise - 
this is a concern as wrong query results is a bad thing...


> Add CSV Serde based on OpenCSV
> ------------------------------
>
>                 Key: HIVE-7777
>                 URL: https://issues.apache.org/jira/browse/HIVE-7777
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Ferdinand Xu
>            Assignee: Ferdinand Xu
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-7777.1.patch, HIVE-7777.2.patch, HIVE-7777.3.patch, 
> HIVE-7777.patch, csv-serde-master.zip
>
>
> There is no official support for csvSerde for hive while there is an open 
> source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
> high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to