[ 
https://issues.apache.org/jira/browse/PIG-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12870508#action_12870508
 ] 

Jeff Zhang commented on PIG-1426:
---------------------------------

I did a simple experiment for the performance comparison.
This is the pig script I used
{code}
a = load '/input';
b = foreach a generate $0,$1;
c = group b by $0 PARALLEL 2;
result = foreach c generate group,SUM(b.$1);
dump result;
{code}

And the following is the result
|| ||Using Int||Using VInt||
|Mapper Output|3,288,892,896|2,688,892,896|
|Time cost for the  pig script|12mins, 23sec|12mins, 1sec| 


I haven't did a complete comparison of PigMix, but I believed it will improve 
the performance.


> Change the size of Tuple from Int to VInt when Serialize Tuple
> --------------------------------------------------------------
>
>                 Key: PIG-1426
>                 URL: https://issues.apache.org/jira/browse/PIG-1426
>             Project: Pig
>          Issue Type: Improvement
>          Components: data
>    Affects Versions: 0.8.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>             Fix For: 0.8.0
>
>         Attachments: PIG_1426.patch
>
>
> Most of  time,  the size of tuple is not very large, one byte is enough for 
> store the size of tuple. So I suggest to use VInt instead of Int for the size 
> of tuple when doing Serialization. Because the key type of map output is 
> Tuple, so this can reduce the amount of data transferred from mapper to 
> reducer. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to