E. Sammer created CRUNCH-480:
--------------------------------
Summary: AvroParquetFileSource doesn't properly configure
user-supplied read schema
Key: CRUNCH-480
URL: https://issues.apache.org/jira/browse/CRUNCH-480
Project: Crunch
Issue Type: Bug
Components: IO
Affects Versions: 0.10.0
Reporter: E. Sammer
Priority: Blocker
It seems like AvroParquetFileSource doesn't properly set the configuration
param required to use a user-supplied read schema that differs from the schema
in the file.
Deep in the guts of Parquet (InternalParquetReader#initialize()), I found this:
{code}
this.recordConverter = readSupport.prepareForRead(
configuration, extraMetadata, fileSchema,
new ReadSupport.ReadContext(requestedSchema, readSupportMetadata));
{code}
Later, in Parquet's AvroReadSupport#prepareForRead(), it appears to ignore the
supplied requestedSchema and, instead, looks for the key avro.read.schema in
the readSupportMetadata map. This is seriously kookie code in Parquet (i.e.
wrong), but because Crunch doesn't supply readSupportMetadata, we can never
properly supply a read schema. Boooo hisssss.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)