Thanks.

So to make sure I understand.  Since I'm using a 'stand-alone' cluster, I would 
set SPARK_WORKER_INSTANCES to something like 2 (instead of the default value of 
1).  Is that correct?  But, it also sounds like I need to explicitly set a 
value for SPARKER_WORKER_CORES (based on what the documentation states).  What 
would I want that value to be based on my configuration below?  Or, would I 
leave that alone?


________________________________
 From: Daniel Siegmann <daniel.siegm...@velos.io>
To: user@spark.apache.org; Darin McBeath <ddmcbe...@yahoo.com> 
Sent: Wednesday, July 30, 2014 5:58 PM
Subject: Re: Number of partitions and Number of concurrent tasks
 


This is correct behavior. Each "core" can execute exactly one task at a time, 
with each task corresponding to a partition. If your cluster only has 24 cores, 
you can only run at most 24 tasks at once.

You could run multiple workers per node to get more executors. That would give 
you more cores in the cluster. But however many cores you have, each core will 
run only one task at a time.




On Wed, Jul 30, 2014 at 3:56 PM, Darin McBeath <ddmcbe...@yahoo.com> wrote:

I have a cluster with 3 nodes (each with 8 cores) using Spark 1.0.1.
>
>
>I have an RDD<String> which I've repartitioned so it has 100 partitions 
>(hoping to increase the parallelism).
>
>
>When I do a transformation (such as filter) on this RDD, I can't  seem to get 
>more than 24 tasks (my total number of cores across the 3 nodes) going at one 
>point in time.  By tasks, I mean the number of tasks that appear under the 
>Application UI.  I tried explicitly setting the spark.default.parallelism to 
>48 (hoping I would get 48 tasks concurrently running) and verified this in the 
>Application UI for the running application but this had no effect.  Perhaps, 
>this is ignored for a 'filter' and the default is the total number of cores 
>available.
>
>
>I'm fairly new with Spark so maybe I'm just missing or misunderstanding 
>something fundamental.  Any help would be appreciated.
>
>
>Thanks.
>
>
>Darin.
>
>


-- 

Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning


440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.iow: www.velos.io

Reply via email to