[
https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-5445:
------------------------------
Attachment: pig-5445-v01.patch
I have no understanding of how the cogroup&MergeJoinIndexer are implemented,
but checking MergeJoinIndexer.java
{code:java}
70 public MergeJoinIndexer(String funcSpec, String innerPlan, String
serializedPhyPlan,
71 String udfCntxtSignature, String scope, String ignoreNulls)
throws ExecException{
72
73 loader =
...
82 precedingPhyPlan =
(PhysicalPlan)ObjectSerializer.deserialize(serializedPhyPlan);
83 if(precedingPhyPlan != null){
84 if(precedingPhyPlan.getLeaves().size() != 1 ||
precedingPhyPlan.getRoots().size() != 1){
85 int errCode = 2168;
86 String errMsg = "Expected physical plan with
exactly one root and one leaf.";
87 throw new
ExecException(errMsg,errCode,PigException.BUG);
88 }
89 this.rightPipelineLeaf =
precedingPhyPlan.getLeaves().get(0);
90 this.rightPipelineRoot = precedingPhyPlan.getRoots().get(0);
91 this.rightPipelineRoot.setInputs(null); *********
92 }
93 } {code}
MergeJoinIndexer is always overwriting the "inputs" with null. This means
"inputs" can be skipped at serialization time. Attaching the patch
(pig-5445-v01.patch) which does that. Size of TEZC-MergeCogroup-1.gld was
reduced by 5 with this patch since it no longer serialize PigContext and POLoad
for MergeJoinIndexer.
> TestTezCompiler.testMergeCogroup fails whenever config is updated
> -----------------------------------------------------------------
>
> Key: PIG-5445
> URL: https://issues.apache.org/jira/browse/PIG-5445
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.19.0
> Reporter: Koji Noguchi
> Assignee: Koji Noguchi
> Priority: Minor
> Attachments: pig-5445-v01.patch
>
>
> TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and
> config that comes with it).
> {noformat}
> testMergeCogroupFailure
> expected:
> <|---a:
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
>
> but was:
> <|---a:
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
> at
> org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292)
> {noformat}
> (edited the diff above a bit to make it easier to identify where the
> difference was)
> Basically 3rd argument to MergeJoinIndexer differed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)