[jira] [Commented] (PIG-4057) Group All followed by CROSS with default parallelism produces wrong results

Rohini Palaniswamy (JIRA) Mon, 14 Jul 2014 11:08:41 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060953#comment-14060953
 ]


Rohini Palaniswamy commented on PIG-4057:
-----------------------------------------

There are more issues to address. 
   - Cross would have problems even if group by was not group all and specified 
a different parallelism than cross. 
   - Another case would be if there was no default parallel specified and 
different reducers were estimated for Job1 and Job2. 

Since GFCross explicitly uses mapred.reduce.tasks need to also see how it is 
working in Tez.

> Group All followed by CROSS with default parallelism produces wrong results
> ---------------------------------------------------------------------------
>
>                 Key: PIG-4057
>                 URL: https://issues.apache.org/jira/browse/PIG-4057
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.14.0
>
>
> SET default_parallel 199;
> ......
> by_size = ...
> uniq_vals = .....
> grpd = group uniq_vals all;
> all_vals = FOREACH grpd GENERATE uniq_vals;
> cross_result = CROSS by_size, all_vals;
> store cross_result into '/tmp/roh/cross/out/recipient_asns';
> Job1: grpd, all_vals, cross_result (The plan does GFCross function here for
> all_vals assuming cross parallelism to be 1 taking that of the current job 
> even
> though it should consider default parallelism 199 of Job 2. Parallelism of 
> Job1
> is 1 because of group all)
> Job2: cross_result (Actual CROSS of by_size and all_vals)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-4057) Group All followed by CROSS with default parallelism produces wrong results

Reply via email to