[ 
https://issues.apache.org/jira/browse/HIVE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774610#comment-13774610
 ] 

Chaoyu Tang commented on HIVE-5320:
-----------------------------------

Agree that this is mainly caused by the improper implementation from 
json-serde, but is there anything Hive can do to better cope with this kind of 
unexpected behavior or prevent it from happening again? To provide clear 
documentations for its related SerDe APIs like that in ListObjectInspector?
                
> Querying a table with nested struct type over JSON data results in errors
> -------------------------------------------------------------------------
>
>                 Key: HIVE-5320
>                 URL: https://issues.apache.org/jira/browse/HIVE-5320
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.9.0
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-5320.patch
>
>
> Querying a table with nested_struct datatype like
> ==
> create table nest_struct_tbl (col1 string, col2 array<struct<a1:string, 
> a2:array<struct<b1:int, b2:string, b3:string>>>>) ROW FORMAT SERDE 
> 'org.openx.data.jsonserde.JsonSerDe'; 
> ==
> over JSON data cause errors including java.lang.IndexOutOfBoundsException or 
> corrupted data. 
> The JsonSerDe used is 
> json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
> The cause is that the method:
> public List<Object> getStructFieldsDataAsList(Object o) 
> in JsonStructObjectInspector.java 
> returns a list referencing to a static arraylist "values"
> So the local variable 'list' in method serialize of Hive LazySimpleSerDe 
> class is returned with same reference in its recursive calls and its element 
> values are kept on being overwritten in the case STRUCT.
> Solutions:
> 1. Fix in JsonSerDe, and change the field 'values' in 
> java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
> to instance scope.
> Filed a ticket to JSonSerDe 
> (https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
> 2. Ideally, in the method serialize of class LazySimpleSerDe, we should 
> defensively save a copy of a list resulted from list = 
> soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of 
> JsonStructObjectInspector, so that the recursive calls of serialize can work 
> properly regardless of the extended SerDe implementation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to