[
https://issues.apache.org/jira/browse/PIG-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837564#comment-15837564
]
Nandor Kollar commented on PIG-5104:
------------------------------------
[~kellyzly] attached a unit test to show the issue. It is not included into the
diff, because we already have similar e2e tests for this scenario, but
executing the unit test might be easier in local mode.
> Union_15 e2e test failing on Spark
> ----------------------------------
>
> Key: PIG-5104
> URL: https://issues.apache.org/jira/browse/PIG-5104
> Project: Pig
> Issue Type: Bug
> Components: spark
> Reporter: Nandor Kollar
> Assignee: Nandor Kollar
> Fix For: spark-branch
>
> Attachments: PIG-5104.patch, TestUnion_15.java
>
>
> While working on PIG-4891 I noticed that Union_15 e2e test is failing on
> Spark mode with this exception:
> Caused by: java.lang.RuntimeException:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
> error from UDF: org.apache.pig.impl.builtin.GFCross [Unable to get
> parallelism hint from job conf]
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.readNext(OutputConsumerIterator.java:89)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.hasNext(OutputConsumerIterator.java:96)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2078:
> Caught error from UDF: org.apache.pig.impl.builtin.GFCross [Unable to get
> parallelism hint from job conf]
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:358)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextDataBag(POUserFunc.java:374)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:335)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:404)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:321)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.ForEachConverter$ForEachFunction$1$1.getNextResult(ForEachConverter.java:87)
> at
> org.apache.pig.backend.hadoop.executionengine.spark.converter.OutputConsumerIterator.readNext(OutputConsumerIterator.java:69)
> ... 11 more
> Caused by: java.io.IOException: Unable to get parallelism hint from job conf
> at org.apache.pig.impl.builtin.GFCross.exec(GFCross.java:66)
> at org.apache.pig.impl.builtin.GFCross.exec(GFCross.java:37)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:330)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)