Chaoyu Tang created HIVE-5320:
---------------------------------
Summary: Querying a table with nested struct type over JSON data
results in errors
Key: HIVE-5320
URL: https://issues.apache.org/jira/browse/HIVE-5320
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Chaoyu Tang
Querying a table with nested_struct datatype like
==
create table nest_struct_tbl (col1 string, col2 array<struct<a1:string,
a2:array<struct<b1:int, b2:string, b3:string>>>>) ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe';
==
over JSON data cause errors including java.lang.IndexOutOfBoundsException or
corrupted data.
The JsonSerDe used is
json-serde-1.1.4.jar/json-serde-1.1.4-jar-dependencies.jar.
The cause is that the method:
public List<Object> getStructFieldsDataAsList(Object o)
in JsonStructObjectInspector.java
returns a list referencing to a static arraylist "values"
So the local variable 'list' in method serialize of Hive LazySimpleSerDe class
is returned with same reference in its recursive calls and its element values
are kept on being overwritten in the case STRUCT.
Solutions:
1. Fix in JsonSerDe, and change the field 'values' in
java.org.openx.data.jsonserde.objectinspector.JsonStructObjectInspector.java
to instance scope.
Filed a ticket to JSonSerDe
(https://github.com/rcongiu/Hive-JSON-Serde/issues/31)
2. Ideally, in the method serialize of class LazySimpleSerDe, we should
defensively save a copy of a list resulted from list =
soi.getStructFieldsDataAsList(obj) in which case the soi is the instance of
JsonStructObjectInspector, so that the recursive calls of serialize can work
properly regardless of the extended SerDe implementation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira