[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16778846#comment-16778846
 ] 

BELUGA BEHR commented on HIVE-21240:
------------------------------------

All unit tests are passing [~bslim] [~kgyrtkirk].  Please consider this patch 
for inclusion into the project.  I understand there is some hesitation 
regarding the change in return type.  Previous a native array was returned and 
now a Collection (List) is returned by the SerDe.  I think it's better to work 
with Java Collections instead of native arrays and if we're going to change the 
return value at all, this is an appropriate time to introduce such a change, 
i.e., in a major (4.0) release.

> JSON SerDe Re-Write
> -------------------
>
>                 Key: HIVE-21240
>                 URL: https://issues.apache.org/jira/browse/HIVE-21240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 4.0.0, 3.1.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to