[
https://issues.apache.org/jira/browse/PIG-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shravan Matthur Narayanamurthy updated PIG-311:
-----------------------------------------------
Status: Patch Available (was: Open)
Implemented the visit(LOCross) method in LogToPhyTranslationVisitor. This
mimics what we were doing in Pig-1.0. To summarize, the following script with
Cross will be converted as shown below:
{noformat}
A1 = load 'f1';
A2 = load 'f2';
.
.
.
An = load 'fn';
B = cross A1,A2,...,An;
{noformat}
{noformat}
A1 = load 'f1';
.
.
.
An = load 'fn';
B1 = foreach A1 generate flatten(GFCross('n','0')), flatten(*);
B2 = foreach A2 generate flatten(GFCross('n','1')), flatten(*);
.
.
.
Bn = foreach An generate flatten(GFCross('n','n-1')), flatten(*);
C = splgroup B1 by ($0,$1,..,$n-1) inner, B2 by ($0,$1,..,$n-1) inner, ..., Bn
by ($0,$1,..,$n-1) inner;
D = foreach C generate flatten($1), flatten($2), ..., flatten($n);
{noformat}
GFCross outputs a bag with n-tuples and the foreach flattens the bag attaches
them to the original tuples thus replicating each tuple.
The only difference from a normal pig script is the splgroup where the
local-rearrange has a slight modification. When it is processing a cross, it
removes the first n values from each value tuple which were attached to it by
the foreach and passes the correct tuple as value while retaining the first n
values as the key.
For ex, the foreach might produce (2,1,R,4) where (R,4) is the actual tuple &
(2,1) is one of the tuples in the GFCross output. The localrearrange here
arranges such tuples into keys and values by makeing (2,1) the key and (R,4)
the value.
So the patch has two changes: one to translator & the other to localrearrange.
> Scripts using CROSS fail in logical to physical translator
> ----------------------------------------------------------
>
> Key: PIG-311
> URL: https://issues.apache.org/jira/browse/PIG-311
> Project: Pig
> Issue Type: Bug
> Affects Versions: types_branch
> Reporter: Alan Gates
> Assignee: Shravan Matthur Narayanamurthy
> Priority: Critical
> Fix For: types_branch
>
> Attachments: 311.patch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);
>
> b = load 'myotherfile' as (name, age, registration, contributions);
>
> c = filter a by age < 19 and gpa < 1.0;
>
>
> d = filter b by age < 19;
>
>
> e = cross c, d;
>
>
> store e into 'outfile';
> {code}
> fails:
> java.io.IOException: Unable to store for alias: e [null]
> at
> org.apache.pig.impl.plan.OperatorPlan.checkInPlan(OperatorPlan.java:252)
> at org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:140)
> at
> org.apache.pig.impl.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:77)
> at
> org.apache.pig.impl.logicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:859)
> at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:101)
> at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:36)
> at
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:229)
> at org.apache.pig.PigServer.compilePp(PigServer.java:556)
> at org.apache.pig.PigServer.execute(PigServer.java:482)
> at org.apache.pig.PigServer.store(PigServer.java:324)
> at org.apache.pig.PigServer.store(PigServer.java:310)
> at
> org.apache.pig.tools.grunt.GruntParser.processStore(GruntParser.java:173)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:317)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:77)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:58)
> at org.apache.pig.Main.main(Main.java:311)
> Caused by: java.lang.NullPointerException
> ... 18 more
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.