[ 
https://issues.apache.org/jira/browse/PIG-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186539#comment-14186539
 ] 

liyunzhang_intel commented on PIG-4228:
---------------------------------------

do following steps to try to reproduce it but can not find error message in the 
bug description
0. prepare hadoop 2.4 and spark 1.1 env
1. use current branch code to build( e27e4e3 - (HEAD, origin/spark, spark) 
PIG-4229: Copy spark dependencies to lib directory (praveen))
ant jar -Dhadoopversion=23
2. run group by pig script:  groupby.pig and movies.csv is attached
#export HADOOP_USER_CLASSPATH_FIRST="true"
bin/pig -x spark  $PIG_HOME/bin/groupby.pig
the result is:
(1921,1)
(1932,1)
(1994,2)
(1985,1)
(1995,1)
(1963,1)
(1929,1)
(1991,1)
(1993,1)



> SchemaTupleBackend error when working on a Spark 1.1.0 cluster
> --------------------------------------------------------------
>
>                 Key: PIG-4228
>                 URL: https://issues.apache.org/jira/browse/PIG-4228
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.14.0
>         Environment: spark-1.1.0
>            Reporter: Carlos Balduz
>              Labels: spark
>
> Whenever I try to run a script on a Spark cluster, I get the following error:
> ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 2 in stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in 
> stage 1.0 (...): java.lang.RuntimeException: 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while 
> executing ForEach at [1-2[-1,-1]]
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         
> scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
>         
> org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> After debugging I have seen that the problem is inside SchemaTupleBackend. 
> Although SparkLauncher initializes this class, when the job gets sent to the 
> executors this is lost and when POOutputConsumerIterator tries to fetch the 
> results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing a 
> RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to