[ https://issues.apache.org/jira/browse/PIG-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Praveen Rachabattuni updated PIG-4228: -------------------------------------- Fix Version/s: spark-branch > SchemaTupleBackend error when working on a Spark 1.1.0 cluster > -------------------------------------------------------------- > > Key: PIG-4228 > URL: https://issues.apache.org/jira/browse/PIG-4228 > Project: Pig > Issue Type: Bug > Components: spark > Affects Versions: 0.14.0 > Environment: spark-1.1.0 > Reporter: Carlos Balduz > Labels: spark > Fix For: spark-branch > > Attachments: groupby.pig, movies_data.csv > > > Whenever I try to run a script on a Spark cluster, I get the following error: > ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in > stage 1.0 (...): java.lang.RuntimeException: > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while > executing ForEach at [1-2[-1,-1]] > > org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62) > > org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68) > > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) > > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29) > > org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34) > > org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68) > > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > scala.collection.Iterator$class.foreach(Iterator.scala:727) > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > org.apache.spark.scheduler.Task.run(Task.scala:54) > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > After debugging I have seen that the problem is inside SchemaTupleBackend. > Although SparkLauncher initializes this class, when the job gets sent to the > executors this is lost and when POOutputConsumerIterator tries to fetch the > results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a > RuntimeException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)