[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709608#action_12709608 ]
Doug Cutting commented on PIG-794: ---------------------------------- Looking at the patch, I have a few questions and remarks: - Why not name the records "Tuple" and "Bag" instead of "T" and "B"? The names are not written in the data, so there's little advantage to shorter names. - Why not, instead of parsing the schema from Json, construct the schema using the Java Schema API? Then you would not need to walk the schema afterwards to find union indexes, and you'd get compile-time API checking rather than potential load-time JSON parse errors. - Why not extend GenericDatumReader and override newRecord() to create either a Bag or a Tuple, then override addField() to add values to either a bag or tuple? This would make the patch much smaller, and potentially permit you to eventually take advantage of GenericDatumReader features like projection and object reuse. - Finally, since you're using a pre-release version of Avro, you should probably name the jar with the subversion revision number. Also note that, since Avro is not yet stable, it should not be yet used for persistent data in production systems. > Use Avro serialization in Pig > ----------------------------- > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.2.0 > Reporter: Rakesh Setty > Fix For: 0.2.0 > > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar, PIG-794.patch > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.