Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Mike Trienis Sat, 23 May 2015 11:35:53 -0700

Yup, and since I have only one core per executor it explains why there was
only one executor utilized. I'll need to investigate which EC2 instance
type is going to be the best fit.


Thanks Evo.

On Fri, May 22, 2015 at 3:47 PM, Evo Eftimov <evo.efti...@isecc.com> wrote:

> A receiver occupies a cpu core, an executor is simply a jvm instance and
> as such it can be granted any number of cores and ram
>
> So check how many cores you have per executor
>
>
> Sent from Samsung Mobile
>
>
> -------- Original message --------
> From: Mike Trienis
> Date:2015/05/22 21:51 (GMT+00:00)
> To: user@spark.apache.org
> Subject: Re: Spark Streaming: all tasks running on one executor (Kinesis +
> Mongodb)
>
> I guess each receiver occupies a executor. So there was only one executor
> available for processing the job.
>
> On Fri, May 22, 2015 at 1:24 PM, Mike Trienis <mike.trie...@orcsol.com>
> wrote:
>
>> Hi All,
>>
>> I have cluster of four nodes (three workers and one master, with one core
>> each) which consumes data from Kinesis at 15 second intervals using two
>> streams (i.e. receivers). The job simply grabs the latest batch and pushes
>> it to MongoDB. I believe that the problem is that all tasks are executed on
>> a single worker node and never distributed to the others. This is true even
>> after I set the number of concurrentJobs to 3. Overall, I would really like
>> to increase throughput (i.e. more than 500 records / second) and understand
>> why all executors are not being utilized.
>>
>> Here are some parameters I have set:
>>
>>    -
>>    - spark.streaming.blockInterval       200
>>    - spark.locality.wait 500
>>    - spark.streaming.concurrentJobs      3
>>
>> This is the code that's actually doing the writing:
>>
>> def write(rdd: RDD[Data], time:Time) : Unit = {
>>     val result = doSomething(rdd, time)
>>     result.foreachPartition { i =>
>>         i.foreach(record => connection.insert(record))
>>     }
>> }
>>
>> def doSomething(rdd: RDD[Data]) : RDD[MyObject] = {
>>     rdd.flatMap(MyObject)
>> }
>>
>> Any ideas as to how to improve the throughput?
>>
>> Thanks, Mike.
>>
>
>

Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Reply via email to