Cheolsoo Park created PIG-3620:
----------------------------------

             Summary: TezCompiler adds duplicate predecessors of blocking 
operators to TezPlan
                 Key: PIG-3620
                 URL: https://issues.apache.org/jira/browse/PIG-3620
             Project: Pig
          Issue Type: Sub-task
          Components: tez
    Affects Versions: tez-branch
            Reporter: Cheolsoo Park
             Fix For: tez-branch


Here is a simplest example that reproduces the issue-
{code:title=test.pig}
a = LOAD 'foo' AS (x:int, y:chararray);
b = GROUP a BY x;
c = FOREACH b GENERATE a.x;
STORE c INTO 'c';
d = FOREACH b GENERATE a.y;
STORE d INTO 'd';
{code}
If you run {{pig \-x tex_local \-e 'explain \-script test.pig'}}, you will see 
two vertices that contains the following sub-plan- 
{code}
Tez vertex scope-27
# Plan on vertex
b: Local Rearrange[tuple]{int}(false) - scope-10
|   |
|   Project[int][0] - scope-11
|
|---a: New For Each(false,false)[bag] - scope-7
    |   |
    |   Cast[int] - scope-2
    |   |
    |   |---Project[bytearray][0] - scope-1
    |   |
    |   Cast[chararray] - scope-5
    |   |
    |   |---Project[bytearray][1] - scope-4
    |
    |---a: 
Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage)
 - scope-0
{code}
What's happening is that since there are 2 stores (and thus 2 data flows, i.e. 
a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them 
into a single tez plan but adds the same sub-plan twice.

This is an issue with any blocking operators (join, union, etc) followed by 
split.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to