[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904555#action_12904555 ]
Jeff Zhang commented on PIG-794: -------------------------------- Besides the above experiment, I also did a experiment to compare AvroRecordWriter and InterRecordWriter in local environment. You can see the attached file AvroTest.java I write 50,000,000 records using these two RecordWriter, and time spent on AvroRecordWriter is 70 seconds while it is 29 seconds using InterRecordWriter. The performance of InterRecordWriter is much better than AvroRecordWriter, internally they use DataFileWriter (avro) and FSDataOutputStream (inter). And both of them use BufferedOutputStream as one buffer layer. The difference is that DataFileWriter (avro) has another buffer layer, it will first write contents to an in-memory block and then write it to BufferedOutputStream when the block is full. Not sure whether this layer have overhead. > Use Avro serialization in Pig > ----------------------------- > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.2.0 > Reporter: Rakesh Setty > Assignee: Dmitriy V. Ryaboy > Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, > AvroStorage_2.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.