[
https://issues.apache.org/jira/browse/HIVE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227096#comment-14227096
]
Ferdinand Xu commented on HIVE-8763:
------------------------------------
Hi [~rstokes], can you please create a review board entry for your patch?
> Support for use of enclosed quotes in LazySimpleSerde
> -----------------------------------------------------
>
> Key: HIVE-8763
> URL: https://issues.apache.org/jira/browse/HIVE-8763
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 0.11.0, 0.12.0, 0.13.0, 0.13.1
> Environment: many - verified on Centos / Redhat with CDH
> Reporter: ronan stokes
> Attachments: HIVE-8763.1.patch
>
>
> Currently the LazySimpleSerde does not support the use of quotes for
> delimited fields to allow use of separators within a quoted field - this
> means having to use alternatives for many common use cases for CSV style
> data.
> Key scenarios that do not work include:
> (3 column row for int, string, float delimited by ',')
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 100, "3.5 "" hard drive, quantity 10", 2650.30
> 100,"3.5 "" hard drive, quantity 10",2650.30
> There are a number of fixes that I have implemented support in the
> deserialization stage to a copy of the Lazy simple serde to address this:
> For serialization, the code is unchanged with the relevant embedded
> characters being escaped.
> Assuming a row with 3 fields - SKU ID, description, price, delimited by ','
> 1) allow use of enclosed quotes around a string field
> For example
> 100,"3.5 inch hard drive, quantity 10",2650.30
> 2) support escaping of quotes within field to allow use of embedded quote
> 100,"3.5 \" hard drive, quantity 10",2650.30
> 3) support for old style CSV embedded quotes
> for example
> 100,"3.5 "" hard drive, quantity 10",2650.30
> 4) support for skipping of leading spaces in field
> For example (note space between first ',' and opening quote)
> 100, "3.5 "" hard drive, quantity 10", 2650.30
> In each case, with the changes these are evaluated as though the delimiters
> and embedded quotes were escaped:
> e.g
> 100, 3.5 \" hard drive\, quantity 10, 2650.30
> All of these are enabled or disabled using serde properties for quotechar,
> whether enclosed quotes is supported, whether double embedded quotes are
> treated as single quote (of same char type)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)