[ https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733057#action_12733057 ]
Ashutosh Chauhan commented on PIG-858: -------------------------------------- While POFRJoin is getting compiled in MRCompiler, it needs to identify for each of its predecessor in physical plan of which compiled MROperator they are part of. Currently, it is assumed to be one of the compiledInputs(an array of MRoper which are immediate predecessor of current MROper in MROper DAG). Mostly this is true, but in cases where one physical operator results in two or more MR operator, this may not be true, as is the case here. When there is an order-by before FRJoin; one of the inputs of POFRJoin will be POSort, but POSort operator will be in the first MROper of the two generated MROperator and thus will not be found in compiledInputs (which contains second MROper). Thus, current way of identifying corresponding MRoper of a physical operator is unreliable. This bug also affects the implementation of merge-sort join https://issues.apache.org/jira/browse/PIG-845 . Since POMergeJoin needs to know which MROper corresponds to its left input and which one corresponds to its right. It can do so by looking into compiledInputs as long as there is no order-by (or similiar PO which results in multiple MROper) as its predecessors. Doing order-by before using merge join is however a natural use-case there. Proposal is to introduce a new private member variable in MRCompiler phyToMROperMap (similiar to logToPhyMap) using which leaf MROper for a given physical operator can be identified. Thoughts? > Order By followed by "replicated" join fails while compiling MR-plan from > physical plan > --------------------------------------------------------------------------------------- > > Key: PIG-858 > URL: https://issues.apache.org/jira/browse/PIG-858 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.3.0 > Reporter: Ashutosh Chauhan > Fix For: 0.4.0 > > > Consider the query: > {code} > A = load 'a'; > B = order A by $0; > C = join A by $0, B by $0; > explain C; > {code} > works. But if replicated join is used instead > {code} > A = load 'a'; > B = order A by $0; > C = join A by $0, B by $0 using "replicated"; > explain C; > {code} > this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error > compiling operator POFRJoin > relevant stacktrace: > {code} > Caused by: java.lang.RuntimeException: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: > ERROR 2034: Error compiling operator POFRJoin > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306) > at org.apache.pig.PigServer.explain(PigServer.java:574) > ... 8 more > Caused by: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: > ERROR 2034: Error compiling operator POFRJoin > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303) > ... 9 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901) > ... 16 more > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.