Hey guys, so I'm trying to read data from a parquet file via select *
statement, and a certain field is giving me trouble. The data is
originally thrift-encoded and here are the Thrift definitions:
struct teststruct {
1: optional string field1;
2: optional string field2;
3: optional string field3;
}
struct mainstruct {
1: optional list<teststruct> teststructs;
}
This parquet file schema was generated:
message ParquetSchema {
optional group teststructs {
repeated group teststruct_tuple {
optional binary field1;
optional binary field2;
optional binary field3;
}
}
}
When i try to run queries involving this 'teststructs' column, I get this
error:
Failed with exception java.io.IOException:java.lang.RuntimeException:
Invalid parquet hive schema: repeated group teststruct_tuple {
optional binary field1;
optional binary field2;
optional binary field3;
}
This looks like it's coming from here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hive/hive-exec/0.12.0-cdh5.0.0-beta-2/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java
And the tuple has 3 fields, instead of 2 or 1, so it's causing it to fail.
I'm not able to change the file schema.. is there a way to get around this
error? I'm running Hive 0.12 from Cloudera CDH5.
--
*Raymond Lau*
Software Engineer - Intern |
[email protected] | (925) 395-3806