possible cause: same TeraGen job sometimes slow and sometimes fast

2017-10-18 Thread Gil Vernik
I performed a series of TeraGen jobs via spark-submit ( each job generated 
equal size dataset into different S3 buckets )
I noticed that some jobs were fast and some were slow.

Slow jobs always had many log prints like
DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 
( or 2, etc.. )

Fast jobs always have few prints of those lines.

Can someone explain me, why the number of those debug prints are vary for 
different executions of the same job? The more i see those prints - so the 
job is slower.
Does someone experienced the same behavior?

Thanks
Gil.






Spark-13979: issues with hadoopConf

2016-07-02 Thread Gil Vernik
Hello,

Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979
?
Does others see the same issues?

Thanks
Gil.




new object store driver for Spark

2016-03-22 Thread Gil Vernik
We recently released an object store connector for Spark. 
https://github.com/SparkTC/stocator
Currently this connector contains driver for the Swift based object store 
( like SoftLayer or any other Swift cluster ), but it can easily support 
additional object stores.
There is a pending patch to support Amazon S3 object store. 

The major highlight is that this connector doesn't create any temporary 
files  and so it achieves very fast response times when Spark persist data 
in the object store.
The new connector supports speculate mode and covers various failure 
scenarios ( like two Spark tasks writing into same object, partial 
corrupted data due to run time exceptions in Spark master, etc ).  It also 
covers https://issues.apache.org/jira/browse/SPARK-10063 and other known 
issues.

The detail algorithm for fault tolerance will be released very soon. For 
now, those who interested, can view the implementation in the code itself.

 https://github.com/SparkTC/stocator contains all the details how to setup 
and use with Spark.

A series of tests showed that the new connector obtains 70% improvements 
for write operations from Spark to Swift and about 30% improvements for 
read operations from Swift into Spark ( comparing to the existing driver 
that Spark uses to integrate with objects stored in Swift). 

There is an ongoing work to add more coverage and fix some known bugs / 
limitations.

All the best
Gil




problems with my code that stimulate task failure - task is not resubmitted

2016-03-05 Thread Gil Vernik
I have some code that stimulates task failure in the speculative mode.
The code i compile to jar and execute with 

./bin/spark-submit --class com.test.SparkTest --jars --driver-memory 2g 
--executor-memory 1g --master local[4] --conf spark.speculation=true 
--conf spark.task.maxFailures=4 SparkTest.jar 

In addition /conf/spark-defaults.sh contains

spark.speculation = true
spark.task.maxFailures = 4

However when i run my code, i don't see any speculate mode working and 
even worst - the failed task is not resubmitted at all and job fails only 
after 1 unsuccessful task.
Attached both the exception and the code. Can someone explain me what i do 
wrong? 

I need help to make
- Job to fail only after 4 attempts to resubmit the task and not after 1 
failed task.
- Speculative mode to function

Here is the exception:

16/03/05 19:32:51 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 
0)
org.apache.spark.SparkException: Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
at org.apache.spark.scheduler.Task.run(Task.scala:81)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: new exception
at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:27)
at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:21)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 
Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:285)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:256)
... 8 more
16/03/05 19:32:51 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
localhost): org.apache.spark.SparkException: Task failed while writing 
rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:151)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
at org.apache.spark.scheduler.Task.run(Task.scala:81)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.Exception: new exception
at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:27)
at com.test.SparkTest$$anonfun$1.apply(SparkTest.scala:21)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 
Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegen$$anonfun$5$$anon$1.hasNext(WholeStageCodegen.scala:285)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:256)
... 8 more

16/03/05 19:32:51 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 
times; aborting job
16/03/05 19:32:51 ERROR InsertIntoHadoopFsRelation: Aborting job.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
0.0 (TID 0, localhost): org.apache.spark.SparkException: Task failed while 
writing rows.

Here is the code

  val sparkConf = new