[
https://issues.apache.org/jira/browse/HIVE-8359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215985#comment-14215985
]
Mickael Lacour commented on HIVE-8359:
--------------------------------------
[~brocknoland], normally I picked the patch that [~rdblue] told me about (the
review on the Review Board), but maybe not the last version.
[~rdblue] wanted me to update this patch to handle the HIVE-6994 instead of
having two patches that will have the same behavior/code. And I like the way
[~spena] wrote the solution (better than mine in my opinion).
[~spena], basically I modified the WritableGroupConverter to clean the 'current
value'. If you don't do that, you will never have a null value inside an array,
but the previous one.
{code}
diff --git
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
index 582a5df..052b36d 100644
---
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
+++
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
@@ -54,6 +54,7 @@ public void start() {
if (isMap) {
mapPairContainer = new Writable[2];
}
+ currentValue = null;
}
@Override
{code}
And the second part was to add "Null" values from the ParquetHiveSerDe (values
that I was skipping before for no valid reason).
{code}
diff --git
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
index b689336..4b36767 100644
--- ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
+++ ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java
@@ -202,13 +202,11 @@ private ArrayWritable createArray(final Object obj, final
ListObjectInspector in
if (sourceArray != null) {
for (final Object curObj : sourceArray) {
- final Writable newObj = createObject(curObj, subInspector);
- if (newObj != null) {
- array.add(newObj);
- }
+ array.add(createObject(curObj, subInspector));
}
}
if (array.size() > 0) {
- final ArrayWritable subArray = new ArrayWritable(array.get(0).getClass(),
+ final ArrayWritable subArray = new ArrayWritable(Writable.class,
array.toArray(new Writable[array.size()]));
return new ArrayWritable(Writable.class, new Writable[] {subArray});
} else {
{code}
And the qtest was just to be sure to handle empty array, null array, array with
null, and the same for map.
{code}
+++ data/files/parquet_array_null_element.txt
@@ -0,0 +1,3 @@
+1|,7|CARRELAGE,MOQUETTE|key11:value11,key12:value12,key13:value13
+2|,|CAILLEBOTIS,|
+3|,42,||key11:value11,key12:,key13:
{code}
If you want to integrate them into your patch, feel free to do it, else I might
want to duplicate your patch (:p) and add this fix.
> Map containing null values are not correctly written in Parquet files
> ---------------------------------------------------------------------
>
> Key: HIVE-8359
> URL: https://issues.apache.org/jira/browse/HIVE-8359
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 0.13.1
> Reporter: Frédéric TERRAZZONI
> Assignee: Sergio Peña
> Attachments: HIVE-8359.1.patch, HIVE-8359.2.patch, HIVE-8359.4.patch,
> map_null_val.avro
>
>
> Tried write a map<string,string> column in a Parquet file. The table should
> contain :
> {code}
> {"key3":"val3","key4":null}
> {"key3":"val3","key4":null}
> {"key1":null,"key2":"val2"}
> {"key3":"val3","key4":null}
> {"key3":"val3","key4":null}
> {code}
> ... and when you do a query like {code}SELECT * from mytable{code}
> We can see that the table is corrupted :
> {code}
> {"key3":"val3"}
> {"key4":"val3"}
> {"key3":"val2"}
> {"key4":"val3"}
> {"key1":"val3"}
> {code}
> I've not been able to read the Parquet file in our software afterwards, and
> consequently I suspect it to be corrupted.
> For those who are interested, I generated this Parquet table from an Avro
> file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)