[
https://issues.apache.org/jira/browse/PIG-4364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264164#comment-14264164
]
liyunzhang_intel commented on PIG-4364:
---------------------------------------
currently we still need following code in SparkLauncher.java Line 112~116
{code}
// Code pulled from MapReduceLauncher
MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext);
mrCompiler.compile();
MROperPlan plan = mrCompiler.getMRPlan();
POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan);
pkgAnnotator.visit();
{code}
If without these code, following script will throw exception:
{quote}
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID
1, localhost): java.lang.NullPointerException:
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.getValueTuple(Packager.java:215)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:424)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage$PeekedBag$1.next(POPackage.java:408)
org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:151)
org.apache.pig.data.DefaultAbstractBag.addAll(DefaultAbstractBag.java:137)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.Packager.attachInput(Packager.java:125)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNextTuple(POPackage.java:283)
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:111)
org.apache.pig.backend.hadoop.executionengine.spark.converter.PackageConverter$PackageFunction.apply(PackageConverter.java:48)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:35)
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:64)
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:920)
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:903)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
{quote}
*why throw this exception?*
POPackageAnnotator#visit will call following function stack to avoid the above
NPE :
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.LoRearrangeDiscoverer#visitLocalRearrange
{code}
keyInfo.put(Integer.valueOf(lrearrange.getIndex()),
new Pair<Boolean, Map<Integer, Integer>>(
lrearrange.isProjectStar(),
lrearrange.getProjectedColsMap()));
{code}
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange#visit
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#handlePackage
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator#visitMROp
After deleting above code, unit test add 89 failures. So close this bug because
this is not a bug.
> remove unnessary MR plan code generated in SparkLauncher.java
> -------------------------------------------------------------
>
> Key: PIG-4364
> URL: https://issues.apache.org/jira/browse/PIG-4364
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Attachments: PIG-4364.patch
>
>
> following code in SparkLauncher.java Line 112~116 is about MR plan is
> generated in Spark mode which is unnecessary.
> {code}
> // Code pulled from MapReduceLauncher
> MRCompiler mrCompiler = new MRCompiler(physicalPlan, pigContext);
> mrCompiler.compile();
> MROperPlan plan = mrCompiler.getMRPlan();
> POPackageAnnotator pkgAnnotator = new POPackageAnnotator(plan);
> pkgAnnotator.visit();
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)