[ 
https://issues.apache.org/jira/browse/PIG-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1724.
---------------------------------

    Resolution: Won't Fix

As Richard pointed out, this has been done by design since Pig 0.3. When 
combining jobs, we need to pickup one set of reducers which means we can only 
honor 1 parallel. If this does not work for a particular script, please, either 
split the script or disable MQ optimization

> Multiquery optimization miscalculates the parallelism and results in extra 0 
> bytes files (Pig 0.7 and 0.8)
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1724
>                 URL: https://issues.apache.org/jira/browse/PIG-1724
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0
>            Reporter: Viraj Bhat
>            Assignee: Richard Ding
>             Fix For: 0.8.0, 0.7.0
>
>         Attachments: samplepig001.in
>
>
> We have found an issue with Pig 0.8 and Pig 0.7 when using Multiquery 
> optimization. It produces more number of part files than required. Please 
> observe that the GROUP ALL is a dummy in this case.
> {code}
> record002 = LOAD 'samplepig001.in' AS (id:chararray,num:int);
> f_records002= FILTER record002 BY num!=50000;
> group01 = GROUP f_records002 ALL PARALLEL 1;
> STORE group01 INTO 'pig_out_direc_SET1';
> set2 = FILTER f_records002 BY num!=200002;
> set2_Group = GROUP set2 ALL PARALLEL 1;
> STORE set2 INTO 'pig_out_direc_SET2';
> set3 = FILTER f_records002 BY num!=100001;
> set3_Group= GROUP set3 BY id PARALLEL 40;
> --set3_Rec4= FILTER set3_Group by num!=5000000;
> STORE set3_Group INTO 'pig_out_direc_SET3';
> {code}
> When run in Pig 0.8 it results in the following output.
> {quote}
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET1
> ...
> Found 40 items
> rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET1/part-r-00000
> ...
> ...
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET1/part-r-00039
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET2
> Found 1 items
> -rw-------   3 viraj users        110 2010-11-13 02:08 
> /user/viraj/pig_out_direc_SET2/part-m-00000
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET3
> Found 40 items
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET3/part-r-00000
> ...
> ...
> -rw-------   3 viraj users          0 2010-11-13 02:09 
> /user/viraj/pig_out_direc_SET3/part-r-00039
> {quote}
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to