[ 
https://issues.apache.org/jira/browse/PIG-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932278#action_12932278
 ] 

Viraj Bhat commented on PIG-1724:
---------------------------------

Hi Richard,
 Thanks for giving the insight into this issue. So it looks like a design 
design has resulted in this behavior? The use case I have shown in the original 
description was taken from a large script. So can we fix this issue in Pig 0.7 
and Pig 0.8.
This issue causes a waste of resources on clusters.
Viraj

> Multiquery optimization miscalculates the parallelism and results in extra 0 
> bytes files (Pig 0.7 and 0.8)
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1724
>                 URL: https://issues.apache.org/jira/browse/PIG-1724
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>            Assignee: Richard Ding
>             Fix For: 0.7.0, 0.8.0
>
>         Attachments: samplepig001.in
>
>
> We have found an issue with Pig 0.8 and Pig 0.7 when using Multiquery 
> optimization. It produces more number of part files than required. Please 
> observe that the GROUP ALL is a dummy in this case.
> {code}
> record002 = LOAD 'samplepig001.in' AS (id:chararray,num:int);
> f_records002= FILTER record002 BY num!=50000;
> group01 = GROUP f_records002 ALL PARALLEL 1;
> STORE group01 INTO 'pig_out_direc_SET1';
> set2 = FILTER f_records002 BY num!=200002;
> set2_Group = GROUP set2 ALL PARALLEL 1;
> STORE set2 INTO 'pig_out_direc_SET2';
> set3 = FILTER f_records002 BY num!=100001;
> set3_Group= GROUP set3 BY id PARALLEL 40;
> --set3_Rec4= FILTER set3_Group by num!=5000000;
> STORE set3_Group INTO 'pig_out_direc_SET3';
> {code}
> When run in Pig 0.8 it results in the following output.
> {quote}
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET1
> ...
> Found 40 items
> rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET1/part-r-00000
> ...
> ...
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET1/part-r-00039
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET2
> Found 1 items
> -rw-------   3 viraj users        110 2010-11-13 02:08 
> /user/viraj/pig_out_direc_SET2/part-m-00000
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET3
> Found 40 items
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET3/part-r-00000
> ...
> ...
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET3/part-r-00039
> {quote}
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to