Re: Slow activation using Spark Streaming's new receiver scheduling mechanism

2016-08-31 Thread Renxia Wang
I do also have this problem. The total time for launching receivers seems related to the total number of executors. In my case, when I run 400 executors with 200 receivers, it takes about a minute for all receivers become active, but with 800 executors, it takes 3 minutes to activate all

Re: Spark Streaming Kinesis Performance Decrease When Cluster Scale Up with More Executors

2016-07-14 Thread Renxia Wang
Additional information: The batch duration in my app is 1 minute, from Spark UI, for each batch, the difference between Output Op Duration and Job Duration is big. E.g. Output Op Duration is 1min while Job Duration is 19s. 2016-07-14 10:49 GMT-07:00 Renxia Wang <renxia.w...@gmail.com>: &g

Spark Streaming Kinesis Performance Decrease When Cluster Scale Up with More Executors

2016-07-14 Thread Renxia Wang
Hi all, I am running a Spark Streaming application with Kinesis on EMR 4.7.1. The application runs on YARN and use client mode. There are 17 worker nodes (c3.8xlarge) with 100 executors and 100 receivers. This setting works fine. But when I increase the number of worker nodes to 50, and increase

Output Op Duration vs Job Duration: What's the difference?

2016-07-12 Thread Renxia Wang
Hi, I am using Spark 1.6.1 on EMR running a streaming app on YARN. From the Spark UI I see that for each batch, the *Output Op Duration* is larger than *Job Duration *(screenshot attached). What's the difference between these two, is the *Job Duration* only counts the executor time of each time,

Bootstrap Action to Install Spark 2.0 on EMR?

2016-07-02 Thread Renxia Wang
Hi all, Anybody had tried out Spark 2.0 on EMR 4.x? Will it work? I am looking for a bootstrap action script to install it on EMR, does some one have a working one to share? Appreciate that! Best, Renxia

Re: How to Set Retry Policy in Spark

2015-10-01 Thread Renxia Wang
Additional Info: I am running Spark on YARN. 2015-10-01 15:42 GMT-07:00 Renxia Wang <renxia.w...@gmail.com>: > Hi guys, > > I know there is a way to set the number of retry of failed tasks, using > spark.task.maxFailures. what is the default policy for the failed tasks > ret

How to Set Retry Policy in Spark

2015-10-01 Thread Renxia Wang
Hi guys, I know there is a way to set the number of retry of failed tasks, using spark.task.maxFailures. what is the default policy for the failed tasks retry? Is it exponential backoff? My tasks sometimes failed because of Socket connection timeout/reset, even with retry, some of the tasks will