[ https://issues.apache.org/jira/browse/HIVE-7777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200053#comment-14200053 ]
Alon Goldshuv commented on HIVE-7777: ------------------------------------- While the serde works fine, it has an issue, which is quite serious IMO - It forces all the column types to String. This means that running a query on data that isn't all string type can return wrong query results. In the unit tests I see a single example of a table using all string columns, and in the tests linked here there are many tables with non-string types, but all the queries seem to be simple COUNT(*), which won't catch the problem. Consider the following example: {noformat} CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") STORED AS TEXTFILE LOCATION '<some location>' tblproperties ("skip.header.line.count"="1"); {noformat} Now consider this sql: hive> select min(totalprice) from test; in this case given my data, the result should have been 874.89, but the actual result became 100001.57 (as it is first according to byte ordering of a string type). this is a wrong result. hive> desc extended test; OK o_totalprice string from deserializer ... I apologize if it's a false alarm and I'm misusing the DDL somehow. Otherwise - this is a concern as wrong query results is a bad thing... > Add CSV Serde based on OpenCSV > ------------------------------ > > Key: HIVE-7777 > URL: https://issues.apache.org/jira/browse/HIVE-7777 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Reporter: Ferdinand Xu > Assignee: Ferdinand Xu > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-7777.1.patch, HIVE-7777.2.patch, HIVE-7777.3.patch, > HIVE-7777.patch, csv-serde-master.zip > > > There is no official support for csvSerde for hive while there is an open > source project in github(https://github.com/ogrodnek/csv-serde). CSV is of > high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)