Hey guys, so I'm trying to read data from a parquet file via select *
statement, and a certain field is giving me trouble.  The data is
originally thrift-encoded and here are the Thrift definitions:

struct teststruct {
  1: optional string field1;
  2: optional string field2;
  3: optional string field3;
}

struct mainstruct {
  1: optional list<teststruct> teststructs;
}

This parquet file schema was generated:

message ParquetSchema {
  optional group teststructs {
    repeated group teststruct_tuple {
      optional binary field1;
      optional binary field2;
      optional binary field3;
    }
  }
}

When i try to run queries involving this 'teststructs' column, I get this
error:

Failed with exception java.io.IOException:java.lang.RuntimeException:
Invalid parquet hive schema: repeated group teststruct_tuple {
      optional binary field1;
      optional binary field2;
      optional binary field3;
    }

This looks like it's coming from here:
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hive/hive-exec/0.12.0-cdh5.0.0-beta-2/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java

And the tuple has 3 fields, instead of 2 or 1, so it's causing it to fail.
 I'm not able to change the file schema.. is there a way to get around this
error?  I'm running Hive 0.12 from Cloudera CDH5.

-- 
*Raymond Lau*
Software Engineer - Intern |
[email protected] | (925) 395-3806

Reply via email to