Cheolsoo Park created PIG-3620: ---------------------------------- Summary: TezCompiler adds duplicate predecessors of blocking operators to TezPlan Key: PIG-3620 URL: https://issues.apache.org/jira/browse/PIG-3620 Project: Pig Issue Type: Sub-task Components: tez Affects Versions: tez-branch Reporter: Cheolsoo Park Fix For: tez-branch
Here is a simplest example that reproduces the issue- {code:title=test.pig} a = LOAD 'foo' AS (x:int, y:chararray); b = GROUP a BY x; c = FOREACH b GENERATE a.x; STORE c INTO 'c'; d = FOREACH b GENERATE a.y; STORE d INTO 'd'; {code} If you run {{pig \-x tex_local \-e 'explain \-script test.pig'}}, you will see two vertices that contains the following sub-plan- {code} Tez vertex scope-27 # Plan on vertex b: Local Rearrange[tuple]{int}(false) - scope-10 | | | Project[int][0] - scope-11 | |---a: New For Each(false,false)[bag] - scope-7 | | | Cast[int] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | |---a: Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage) - scope-0 {code} What's happening is that since there are 2 stores (and thus 2 data flows, i.e. a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them into a single tez plan but adds the same sub-plan twice. This is an issue with any blocking operators (join, union, etc) followed by split. -- This message was sent by Atlassian JIRA (v6.1.4#6159)