[
https://issues.apache.org/jira/browse/PIG-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078394#comment-13078394
]
Bill Graham commented on PIG-2195:
----------------------------------
After spending many hours getting my head around what AvroStorage does, I've
learned a few things some of which lead me to amend my initial assessment of
the issue:
# Reading from PigStorage and writing via AvroStorage works as advertised when
using Avro 1.4.1. The ClassCastException I initially referenced happens when
using Avro 1.5.1. Also the current trunk doesn't build against 1.5.1. We can
address this in a separate JIRA.
# There is a nasty hidden bug that causes the unit tests to not run in
isolation. This can cause newly added unit tests or re-ordered tests to fail.
The issue is that {{PigSchema2Avro}} has a {{static int tupleIndex}} that it
increments after each call to {{getRecordName()}} and the tests expect record
names to have a certain {{tupleIndex}} value included. As a temporary hack I've
added a {{public static void setTupleIndex(int index)}} method to that class to
allow unit tests to reset it, but this static int approach should really be
revisited.
# I've added additional unit tests for reading from a text file and producing
Avro.
# I've added support for passing a {{schema_file}} value instead of a data file
as shown above.
I'll upload a patch shortly.
> AvroStorage fails to STORE when LOADing via PigStorage
> ------------------------------------------------------
>
> Key: PIG-2195
> URL: https://issues.apache.org/jira/browse/PIG-2195
> Project: Pig
> Issue Type: Bug
> Reporter: Bill Graham
> Assignee: Bill Graham
>
> Reading data via {{PigStorage}} and writing it via {{AvroStorage}} fails with
> an exception like this
> {{java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
> cast to org.apache.avro.generic.IndexedRecord}}
> The Pig script in this section of the documentation shows an example like
> this that fails:
> http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data#AvroStorage-PigsupportforAvrodata-A.Howtostoredataindifferentways.
> A workaround currently exists to produce avro from TSVs like this:
> {noformat}
> avro = LOAD 'inputPath/' AS (foo);
> STORE avro INTO 'outputPath/' USING oap.piggybank.storage.avro.AvroStorage(
> '{"data":"data_file.avro",
> "same":"data_file.avro", "field0":"def:bar"}');
> {noformat}
> This is redundant though and {{data}} and {{same}} seem to indicate the same
> thing. This approach also requires an existing avro data file to exist. This
> patch will make the following alternate constructor syntax's work as well.
> # Read schema from an existing data file:
> {noformat}
> '{"data":"data_file.avro", "field0":"def:bar"}');
> {noformat}
> # Read schema from an existing schema file:
> {noformat}
> '{"schema_file":"data_file.avsc", "field0":"def:bar"}');
> {noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira