Carlos Balduz created PIG-4228:
----------------------------------

             Summary: SchemaTupleBackend error when working on a Spark 1.1.0 
cluster
                 Key: PIG-4228
                 URL: https://issues.apache.org/jira/browse/PIG-4228
             Project: Pig
          Issue Type: Bug
          Components: spark
    Affects Versions: 0.14.0
         Environment: spark-1.1.0
            Reporter: Carlos Balduz


Whenever I try to run a script on a Spark cluster, I get the following error:

ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 
1.0 (...): java.lang.RuntimeException: 
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
executing ForEach at [1-2[-1,-1]]
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
        
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        
scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
        
org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
        
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        scala.collection.Iterator$class.foreach(Iterator.scala:727)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        
org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
        
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)


After debugging I have seen that the problem is inside SchemaTupleBackend. 
Although SparkLauncher initializes this class, when the job gets sent to the 
executors this is lost and when POOutputConsumerIterator tries to fetch the 
results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a 
RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to