[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904575#action_12904575 ]
Jeff Zhang commented on PIG-794: -------------------------------- Attach the updated patch Avro_Strorage_3.patch ( I found one place can been optimized) The following is the latest experiment result (which shows AvroStorage is a little better than InterStorage) ||Storage || Time spent on job_1 || Output size of job_1 || Mapper task number of job_2 || Time spent on job_2 || Total spent time on pig script || |AvroStorage |3min 51 sec |7.96G |120 |17min 09 sec |21min 0 sec| |InterStorage |4min 33 sec |9.55G |143 |17min 17 sec |21min 50 sec| > Use Avro serialization in Pig > ----------------------------- > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.2.0 > Reporter: Rakesh Setty > Assignee: Dmitriy V. Ryaboy > Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, > AvroStorage_2.patch, AvroStorage_3.patch, AvroTest.java, > jackson-asl-0.9.4.jar, PIG-794.patch > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.