[
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904575#action_12904575
]
Jeff Zhang commented on PIG-794:
--------------------------------
Attach the updated patch Avro_Strorage_3.patch ( I found one place can been
optimized)
The following is the latest experiment result (which shows AvroStorage is a
little better than InterStorage)
||Storage || Time spent on job_1 || Output size of job_1 ||
Mapper task number of job_2 || Time spent on job_2 || Total spent time on pig
script ||
|AvroStorage |3min 51 sec |7.96G |120 |17min 09 sec |21min 0 sec|
|InterStorage |4min 33 sec |9.55G |143 |17min 17 sec |21min 50 sec|
> Use Avro serialization in Pig
> -----------------------------
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.2.0
> Reporter: Rakesh Setty
> Assignee: Dmitriy V. Ryaboy
> Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch,
> AvroStorage_2.patch, AvroStorage_3.patch, AvroTest.java,
> jackson-asl-0.9.4.jar, PIG-794.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs
> instead of the current BinStorage. Attached is an implementation of
> AvroBinStorage which performs significantly better compared to BinStorage on
> our benchmarks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.