[ https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215985#comment-14215985 ]
Mickael Lacour commented on HIVE-8359: -------------------------------------- [~brocknoland], normally I picked the patch that [~rdblue] told me about (the review on the Review Board), but maybe not the last version. [~rdblue] wanted me to update this patch to handle the HIVE-6994 instead of having two patches that will have the same behavior/code. And I like the way [~spena] wrote the solution (better than mine in my opinion). [~spena], basically I modified the WritableGroupConverter to clean the 'current value'. If you don't do that, you will never have a null value inside an array, but the previous one. {code} diff --git ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java index 582a5df..052b36d 100644 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java +++ ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java @@ -54,6 +54,7 @@ public void start() { if (isMap) { mapPairContainer = new Writable[2]; } + currentValue = null; } @Override {code} And the second part was to add "Null" values from the ParquetHiveSerDe (values that I was skipping before for no valid reason). {code} diff --git ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java index b689336..4b36767 100644 --- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java +++ ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java @@ -202,13 +202,11 @@ private ArrayWritable createArray(final Object obj, final ListObjectInspector in if (sourceArray != null) { for (final Object curObj : sourceArray) { - final Writable newObj = createObject(curObj, subInspector); - if (newObj != null) { - array.add(newObj); - } + array.add(createObject(curObj, subInspector)); } } if (array.size() > 0) { - final ArrayWritable subArray = new ArrayWritable(array.get(0).getClass(), + final ArrayWritable subArray = new ArrayWritable(Writable.class, array.toArray(new Writable[array.size()])); return new ArrayWritable(Writable.class, new Writable[] {subArray}); } else { {code} And the qtest was just to be sure to handle empty array, null array, array with null, and the same for map. {code} +++ data/files/parquet_array_null_element.txt @@ -0,0 +1,3 @@ +1|,7|CARRELAGE,MOQUETTE|key11:value11,key12:value12,key13:value13 +2|,|CAILLEBOTIS,| +3|,42,||key11:value11,key12:,key13: {code} If you want to integrate them into your patch, feel free to do it, else I might want to duplicate your patch (:p) and add this fix. > Map containing null values are not correctly written in Parquet files > --------------------------------------------------------------------- > > Key: HIVE-8359 > URL: https://issues.apache.org/jira/browse/HIVE-8359 > Project: Hive > Issue Type: Bug > Components: File Formats > Affects Versions: 0.13.1 > Reporter: Frédéric TERRAZZONI > Assignee: Sergio Peña > Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch, > map_null_val.avro > > > Tried write a map<string,string> column in a Parquet file. The table should > contain : > {code} > {"key3":"val3","key4":null} > {"key3":"val3","key4":null} > {"key1":null,"key2":"val2"} > {"key3":"val3","key4":null} > {"key3":"val3","key4":null} > {code} > ... and when you do a query like {code}SELECT * from mytable{code} > We can see that the table is corrupted : > {code} > {"key3":"val3"} > {"key4":"val3"} > {"key3":"val2"} > {"key4":"val3"} > {"key1":"val3"} > {code} > I've not been able to read the Parquet file in our software afterwards, and > consequently I suspect it to be corrupted. > For those who are interested, I generated this Parquet table from an Avro > file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)