[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985022#action_12985022
 ] 

Scott Carey commented on PIG-1748:
----------------------------------

About plans on the Avro side:

I plan on merging my work with this (great!) work into the Avro project.  In 
the long  run the Avro project is a better place for this for several reasons, 
but in the short term it does not matter.  It will be some time before it is 
available from Avro.

* Avro is fully mavenized in 1.5.0 (due out in a few weeks), meaning it is easy 
to add sub module jars such as 'avro-pig.jar'.   Furthermore, its easy to have 
multiple versions for each version of pig if needed.  For example we could 
simultaneously release avro-pig0.7.jar, avro-pig0.8.jar etc. as part of Avro 
1.6.0 if it was necessary due to API breakage or extra features enabled in 
newer versions of Pig.   
* A lot of the work here is applicable to multiple systems, I plan to share 
code with Avro Hive SerDe's when those are implemented.  This may lead to a 
general module that helps projects translate their schemas to avro and back.

None of this impacts the work here in the short term, but I'm sure people will 
be interested in these plans and may have other ideas/suggestions on how to 
work on this in a way that is not too fragmented.

> Add load/store function AvroStorage for avro data
> -------------------------------------------------
>
>                 Key: PIG-1748
>                 URL: https://issues.apache.org/jira/browse/PIG-1748
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: lin guo
>            Assignee: Jakob Homan
>         Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to