[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

BELUGA BEHR (JIRA) Wed, 27 Feb 2019 08:42:39 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779499#comment-16779499
 ]


BELUGA BEHR commented on HIVE-21240:
------------------------------------

[~bslim]  With a large project like Hive, maintained by many different 
supporters and countless number of additional troubleshooters that dig through 
the code to resolve issues, it is all the more important to adhere to best 
practices.  With few exceptions, everything should be a Java Collection.  
Making smart choices about the actual data structures used (Set, Map, List, 
etc.) is going to yield much more benefit than trying to manipulate primitive 
arrays.  I've never had a Hive user complain that they wished it was 2% faster, 
but I hear all the time about how complicated the product is and how difficult 
it is to troubleshoot.

There are a few books written on the topic which I won't regurgitate here, but 
I think this sums it up well:

https://stackoverflow.com/questions/6100148/collection-interface-vs-arrays



> JSON SerDe Re-Write
> -------------------
>
>                 Key: HIVE-21240
>                 URL: https://issues.apache.org/jira/browse/HIVE-21240
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 4.0.0, 3.1.1
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

Reply via email to