[ https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777102#comment-16777102 ]
BELUGA BEHR commented on HIVE-21240: ------------------------------------ [~kgyrtkirk] Thanks! #1 I'm not sure I understand the first request. Are you talking specifically about the HCat code? Are there missing unit tests here? Is that why it passes even though the data types have been changed? As I see it the native arrays are all transformed into Java Collections: {code:java|title=HCat JsonSerDe} List fatRow = fatLand((Object[]) row); return new DefaultHCatRecord(fatRow); ... return Arrays.asList(ArrayUtils.toObject((int[]) arr)); {code} So, the JSON SerDe should just create Java Collections from the get-go instead of having to transform it later. #2 I noted that the Kafka_Handler Q-Test fails locally on trunk as well. I searched across JIRA and see this test fails across many places. I can keep looking at it though. #3 I don't think there's much value in going back and changing the code and testing it. These proposed changes are not about making the SerDe faster, I just want to put out there that there isn't a huge regression. If it's a bit quicker, than that's an added bonus. > JSON SerDe Re-Write > ------------------- > > Key: HIVE-21240 > URL: https://issues.apache.org/jira/browse/HIVE-21240 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers > Affects Versions: 4.0.0, 3.1.1 > Reporter: BELUGA BEHR > Assignee: BELUGA BEHR > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, > HIVE-21240.10.patch, HIVE-21240.2.patch, HIVE-21240.3.patch, > HIVE-21240.4.patch, HIVE-21240.5.patch, HIVE-21240.6.patch, > HIVE-21240.7.patch, HIVE-21240.9.patch, HIVE-24240.8.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The JSON SerDe has a few issues, I will link them to this JIRA. > * Use Jackson Tree parser instead of manually parsing > * Added support for base-64 encoded data (the expected format when using JSON) > * Added support to skip blank lines (returns all columns as null values) > * Current JSON parser accepts, but does not apply, custom timestamp formats > in most cases > * Added some unit tests > * Added cache for column-name to column-index searches, currently O\(n\) for > each row processed, for each column in the row -- This message was sent by Atlassian JIRA (v7.6.3#76005)