AvroStorage fails to STORE when LOADing via PigStorage
------------------------------------------------------

                 Key: PIG-2195
                 URL: https://issues.apache.org/jira/browse/PIG-2195
             Project: Pig
          Issue Type: Bug
            Reporter: Bill Graham
            Assignee: Bill Graham


Reading data via {{PigStorage}} and writing it via {{AvroStorage}} fails with 
an exception like this

{{java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be 
cast to org.apache.avro.generic.IndexedRecord}}

The Pig script in this section of the documentation shows an example like this 
that fails:

http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data#AvroStorage-PigsupportforAvrodata-A.Howtostoredataindifferentways.

A workaround currently exists to produce avro from TSVs like this:

{noformat}
avro = LOAD 'inputPath/' AS (foo);
STORE avro INTO 'outputPath/' USING oap.piggybank.storage.avro.AvroStorage(
  '{"data":"data_file.avro",
    "same":"data_file.avro", "field0":"def:bar"}');
{noformat}

This is redundant though and {{data}} and {{same}} seem to indicate the same 
thing. This approach also requires an existing avro data file to exist. This 
patch will make the following alternate constructor syntax's work as well.

# Read schema from an existing data file:
{noformat}
  '{"data":"data_file.avro", "field0":"def:bar"}');
{noformat}
# Read schema from an existing schema file:
{noformat}
  '{"schema_file":"data_file.avsc", "field0":"def:bar"}');
{noformat}


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to