Hello,

We are using com.twitter.parquet-thrift 1.2.5-cdh5.0.0-beta-2 for writing
data into HDFS. Recently, we tried modifying our thrift schema by adding a
single optional field. According to
http://diwakergupta.github.io/thrift-missing-guide/#_versioning_compatibility
this change should not affect compatibility.

However, when using the new schema to decode old files, we get the
following error:

[error] (run-main-0) parquet.io.InvalidRecordException: state not found in
message ParquetSchema {
at parquet.schema.GroupType.getFieldIndex(GroupType.java:104)
at parquet.schema.GroupType.getType(GroupType.java:136)
at parquet.schema.GroupType.checkGroupContains(GroupType.java:273)
at parquet.schema.MessageType.checkContains(MessageType.java:126)
at parquet.hadoop.api.ReadSupport.getSchemaForRead(ReadSupport.java:55)
at parquet.hadoop.thrift.ThriftReadSupport.init(ThriftReadSupport.java:116)
at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:107)
at parquet.hadoop.ParquetReader.<init>(ParquetReader.java:69)
at parquet.thrift.ThriftParquetReader.<init>(ThriftParquetReader.java:70)
...
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

The code we are using is:

val reader = new ThriftParquetReader[ExampleClass](conf, path)
reader.read

Is parquet-thrift supposed to offer the same backwards compatibility as
thrift? Are we doing something wrong?

Thank you,
Issac

-- 
--
*Issac Buenrostro*
Software Engineer |
[email protected] | (617) 997-3350
www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala
<http://www.twitter.com/ooyala>

Reply via email to