[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904575#action_12904575
 ] 

Jeff Zhang commented on PIG-794:
--------------------------------

Attach the updated patch Avro_Strorage_3.patch ( I found one place can been 
optimized)
The following is the latest experiment result (which shows AvroStorage is a 
little better than InterStorage)
||Storage       || Time spent on job_1  || Output size of job_1         || 
Mapper task number of job_2  || Time spent on job_2  || Total spent time on pig 
script ||
|AvroStorage   |3min 51 sec     |7.96G  |120 |17min 09 sec |21min 0 sec|
|InterStorage   |4min 33 sec    |9.55G  |143    |17min 17 sec   |21min 50 sec|

> Use Avro serialization in Pig
> -----------------------------
>
>                 Key: PIG-794
>                 URL: https://issues.apache.org/jira/browse/PIG-794
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Rakesh Setty
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, 
> AvroStorage_2.patch, AvroStorage_3.patch, AvroTest.java, 
> jackson-asl-0.9.4.jar, PIG-794.patch
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to