> On Dec. 1, 2013, 5:06 p.m., Rohini Palaniswamy wrote: > > The code is fine if we have union after some processing. But for simple > > load and union case as below, this will create 3 vertices - 2 load vertices > > and one union vertex. > > > > a = load 'a' > > b = load 'b' > > c = union a, b > > > > In MR, this is handled in a simple map > > > > C: Store(/tmp/tezout:PigStorage) - scope-23 > > | > > |---C: Union[bag] - scope-22 > > | > > |---A: New For Each(false,false,false)[bag] - scope-10 > > | | | > > | .......... > > | > > |---B: New For Each(false,false,false)[bag] - scope-21 > > | | > > | ......... > > | > > |---B: Load(/tmp/data:org.apache.pig.builtin.PigStorage) - > > scope-11-------- > > > > We should also try do that in a single vertex to be more optimal. We can > > handle that in a separate jira though. > >
Thank you Rohini for the review! You're right that we can optimized it once Tez allows multiple inputs on root vertices. But when I tried to implement union in a single vertex, I ran into this error- Caused by: java.lang.IllegalStateException: For now, only a single Root Input can be added to a Vertex at org.apache.tez.dag.api.Vertex.addInput(Vertex.java:156) So it seems not allowed for now. - Cheolsoo ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15931/#review29567 ----------------------------------------------------------- On Dec. 1, 2013, 7 a.m., Cheolsoo Park wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15931/ > ----------------------------------------------------------- > > (Updated Dec. 1, 2013, 7 a.m.) > > > Review request for pig, Alex Bain, Daniel Dai, Mark Wagner, and Rohini > Palaniswamy. > > > Bugs: PIG-3585 > https://issues.apache.org/jira/browse/PIG-3585 > > > Repository: pig-git > > > Description > ------- > > This patch implements union as follows: load vertices -> broadcast edges -> > union vertex. > > Th changes include: > * In the front-end, TezCompiler converts POUnion into a new vertex and > connects it to its predecessors with broadcast edges. > * In the back-end, a new POPackage class called POBroadcastTezLoad is added. > This classes implements TezLoad interface, and it pulls every record from > ShuffledUnorderedKVInputs in order and unions them. > > > Diffs > ----- > > > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java > e49de40 > > src/org/apache/pig/backend/hadoop/executionengine/tez/POBroadcastTezLoad.java > e69de29 > src/org/apache/pig/backend/hadoop/executionengine/tez/PigProcessor.java > 9a2b499 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezCompiler.java > 529bf30 > src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java > e3f5a5d > src/org/apache/pig/backend/hadoop/executionengine/tez/TezOperator.java > dcd6a5a > test/e2e/pig/tests/tez.conf 7fd5fb1 > > Diff: https://reviews.apache.org/r/15931/diff/ > > > Testing > ------- > > * New e2e test case is added. > * ant test-tez passes. > * All e2e tests pass. > > > Thanks, > > Cheolsoo Park > >