Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Evo Eftimov Fri, 22 May 2015 15:50:23 -0700

A receiver occupies a cpu core, an executor is simply a jvm instance and as 
such it can be granted any number of cores and ram


So check how many cores you have per executor


Sent from Samsung Mobile

<div>-------- Original message --------</div><div>From: Mike Trienis 
<mike.trie...@orcsol.com> </div><div>Date:2015/05/22  21:51  (GMT+00:00) 
</div><div>To: user@spark.apache.org </div><div>Subject: Re: Spark Streaming: 
all tasks running on one executor (Kinesis + Mongodb) </div><div>
</div>I guess each receiver occupies a executor. So there was only one executor 
available for processing the job. 

On Fri, May 22, 2015 at 1:24 PM, Mike Trienis <mike.trie...@orcsol.com> wrote:
Hi All,

I have cluster of four nodes (three workers and one master, with one core each) 
which consumes data from Kinesis at 15 second intervals using two streams (i.e. 
receivers). The job simply grabs the latest batch and pushes it to MongoDB. I 
believe that the problem is that all tasks are executed on a single worker node 
and never distributed to the others. This is true even after I set the number 
of concurrentJobs to 3. Overall, I would really like to increase throughput 
(i.e. more than 500 records / second) and understand why all executors are not 
being utilized. 

Here are some parameters I have set: 
spark.streaming.blockInterval       200
spark.locality.wait                             500
spark.streaming.concurrentJobs      3
This is the code that's actually doing the writing:

def write(rdd: RDD[Data], time:Time) : Unit = {
    val result = doSomething(rdd, time)
    result.foreachPartition { i =>
        i.foreach(record => connection.insert(record))
    }
}

def doSomething(rdd: RDD[Data]) : RDD[MyObject] = {
    rdd.flatMap(MyObject)
}

Any ideas as to how to improve the throughput?

Thanks, Mike.

Re: Spark Streaming: all tasks running on one executor (Kinesis + Mongodb)

Reply via email to