[ https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jakob Homan updated PIG-1748: ----------------------------- Attachment: PIG-1748-3.patch Figured out the test failures. Turns out that when one does a full run of the unit tests (which I cannot get to succeed on my machine), the ~/pigtest directory is left running during the contrib tests and within the contrib build.xml file is a {{junit.hadoop.conf}} variable pointing those tests to the hdfs the pig tests had running but is no longer up. This conf trickles down to the test which ends up using it as the default filesystem and tries to connect to it, but can't since that HDFS is gone. This doesn't occur when run through an idea like IntelliJ since the IDE doesn't use contrib's build.xml settings. I've fixed this by explicitly referencing the local file system in the tests, though this seems like a bug in the contrib build system to me. I'll open a JIRA to address this. @Felix - good catch. To provide a cleaner separation between my work and Lin's, I would like to go ahead and fix this bug in a separate JIRA after 1748 is committed. How does this sound to you? Contrib tests pass, except org.apache.pig.piggybank.test.TestPigStorageSchema, which fails for me with or without the patch. Version 3 of the patch is updated to include better behavior in for directories with files that should be filtered out. > Add load/store function AvroStorage for avro data > ------------------------------------------------- > > Key: PIG-1748 > URL: https://issues.apache.org/jira/browse/PIG-1748 > Project: Pig > Issue Type: Improvement > Components: impl > Reporter: lin guo > Assignee: Jakob Homan > Attachments: avro_storage.patch, avro_test_files.tar.gz, > PIG-1748-2.patch, PIG-1748-3.patch > > > We want to use Pig to process arbitrary Avro data and store results as Avro > files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. > Due to discrepancies of Avro and Pig data models, AvroStorage has: > 1. Limited support for "record": we do not support recursively defined record > because the number of fields in such records is data dependent. > 2. Limited support for "union": we only accept nullable union like ["null", > "some-type"]. > For simplicity, we also make the following assumptions: > If the input directory is a leaf directory, then we assume Avro data files in > it have the same schema; > If the input directory contains sub-directories, then we assume Avro data > files in all sub-directories have the same schema. > AvroStorage takes no input parameters when used as a LoadFunc (except for > "debug [debug-level]"). > Users can provide parameters to AvroStorage when used as a StoreFunc. If they > don't, Avro schema of output data is derived from its > Pig schema. > Detailed documentation can be found in > http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.