possible cause: same TeraGen job sometimes slow and sometimes fast
I performed a series of TeraGen jobs via spark-submit ( each job generated equal size dataset into different S3 buckets ) I noticed that some jobs were fast and some were slow. Slow jobs always had many log prints like DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 ( or 2, etc.. ) Fast jobs always have few prints of those lines. Can someone explain me, why the number of those debug prints are vary for different executions of the same job? The more i see those prints - so the job is slower. Does someone experienced the same behavior? Thanks Gil.
Spark-13979: issues with hadoopConf
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.
new object store driver for Spark
We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to support Amazon S3 object store. The major highlight is that this connector doesn't create any temporary files and so it achieves very fast response times when Spark persist data in the object store. The new connector supports speculate mode and covers various failure scenarios ( like two Spark tasks writing into same object, partial corrupted data due to run time exceptions in Spark master, etc ). It also covers https://issues.apache.org/jira/browse/SPARK-10063 and other known issues. The detail algorithm for fault tolerance will be released very soon. For now, those who interested, can view the implementation in the code itself. https://github.com/SparkTC/stocator contains all the details how to setup and use with Spark. A series of tests showed that the new connector obtains 70% improvements for write operations from Spark to Swift and about 30% improvements for read operations from Swift into Spark ( comparing to the existing driver that Spark uses to integrate with objects stored in Swift). There is an ongoing work to add more coverage and fix some known bugs / limitations. All the best Gil
problems with my code that stimulate task failure - task is not resubmitted
I have some code that stimulates task failure in the speculative mode. The code i compile to jar and execute with ./bin/spark-submit --class com.test.SparkTest --jars --driver-memory 2g --executor-memory 1g --master local[4] --conf spark.speculation=true --conf spark.task.maxFailures=4 SparkTest.jar In addition /conf/spark-defaults.sh contains spark.speculation = true spark.task.maxFailures = 4 However when i run my code, i don't see any speculate mode working and even worst - the failed task is not resubmitted at all and job fails only after 1 unsuccessful task. Attached both the exception and the code. Can someone explain me what i do wrong? I need help to make - Job to fail only after 4 attempts to resubmit the task and not after 1 failed task. - Speculative mode to function Here is the exception: 16/03/05 19:32:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69) at org.apache.spark.scheduler.Task.run(Task.scala:81) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: new exception at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:27) at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:21) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:285) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:256) ... 8 more 16/03/05 19:32:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69) at org.apache.spark.scheduler.Task.run(Task.scala:81) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: new exception at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:27) at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:21) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:285) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:256) ... 8 more 16/03/05 19:32:51 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job 16/03/05 19:32:51 ERROR InsertIntoHadoopFsRelation: Aborting job. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.SparkException: Task failed while writing rows. Here is the code val sparkConf = new