Re: Determining number of executors within RDD

2015-06-10 Thread Nishkam Ravi
This PR adds support for multiple executors per worker:
https://github.com/apache/spark/pull/731 and should be available in 1.4.

Thanks,
Nishkam

On Wed, Jun 10, 2015 at 1:35 PM, Evo Eftimov  wrote:

> We/i were discussing STANDALONE mode, besides maxdml had already
> summarized what is available and possible under yarn
>
> So let me recap - for standalone mode if you need more than 1 executor per
> physical host e.g. to partition its sys resources more finley (especialy
> RAM per jvm instance) you need to got for what is essentialy a bit of a
> hack ie runn8ng more than 1 workers per machine
>
>
> Sent from Samsung Mobile
>
>
>  Original message 
> From: Sandy Ryza
> Date:2015/06/10 21:31 (GMT+00:00)
> To: Evo Eftimov
> Cc: maxdml ,user@spark.apache.org
> Subject: Re: Determining number of executors within RDD
>
> On YARN, there is no concept of a Spark Worker.  Multiple executors will
> be run per node without any effort required by the user, as long as all the
> executors fit within each node's resource limits.
>
> -Sandy
>
> On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov 
> wrote:
>
>> Yes  i think it is ONE worker ONE executor as executor is nothing but jvm
>> instance spawned by the worker
>>
>> To run more executors ie jvm instances on the same physical cluster node
>> you need to run more than one worker on that node and then allocate only
>> part of the sys resourced to that worker/executot
>>
>>
>> Sent from Samsung Mobile
>>
>>
>> ---- Original message 
>> From: maxdml
>> Date:2015/06/10 19:56 (GMT+00:00)
>> To: user@spark.apache.org
>> Subject: Re: Determining number of executors within RDD
>>
>> Actually this is somehow confusing for two reasons:
>>
>> - First, the option 'spark.executor.instances', which seems to be only
>> dealt
>> with in the case of YARN in the source code of SparkSubmit.scala, is also
>> present in the conf/spark-env.sh file under the standalone section, which
>> would indicate that it is also available for this mode
>>
>> - Second, a post from Andrew Or states that this properties define the
>> number of workers in the cluster, not the number of executors on a given
>> worker.
>> (
>> http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html
>> )
>>
>> Could anyone clarify this? :-)
>>
>> Thanks.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Determining number of executors within RDD

2015-06-10 Thread Evo Eftimov
We/i were discussing STANDALONE mode, besides maxdml had already summarized 
what is available and possible under yarn

So let me recap - for standalone mode if you need more than 1 executor per 
physical host e.g. to partition its sys resources more finley (especialy RAM 
per jvm instance) you need to got for what is essentialy a bit of a hack ie 
runn8ng more than 1 workers per machine


Sent from Samsung Mobile

 Original message From: Sandy Ryza 
 Date:2015/06/10  21:31  (GMT+00:00) 
To: Evo Eftimov  Cc: maxdml 
,user@spark.apache.org Subject: Re: Determining 
number of executors within RDD 
On YARN, there is no concept of a Spark Worker.  Multiple executors will 
be run per node without any effort required by the user, as long as all the 
executors fit within each node's resource limits.

-Sandy

On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov  wrote:
Yes  i think it is ONE worker ONE executor as executor is nothing but jvm 
instance spawned by the worker 

To run more executors ie jvm instances on the same physical cluster node you 
need to run more than one worker on that node and then allocate only part of 
the sys resourced to that worker/executot


Sent from Samsung Mobile


 Original message 
From: maxdml
Date:2015/06/10 19:56 (GMT+00:00)
To: user@spark.apache.org
Subject: Re: Determining number of executors within RDD

Actually this is somehow confusing for two reasons:

- First, the option 'spark.executor.instances', which seems to be only dealt
with in the case of YARN in the source code of SparkSubmit.scala, is also
present in the conf/spark-env.sh file under the standalone section, which
would indicate that it is also available for this mode

- Second, a post from Andrew Or states that this properties define the
number of workers in the cluster, not the number of executors on a given
worker.
(http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html)

Could anyone clarify this? :-)

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: Determining number of executors within RDD

2015-06-10 Thread Sandy Ryza
On YARN, there is no concept of a Spark Worker.  Multiple executors will be
run per node without any effort required by the user, as long as all the
executors fit within each node's resource limits.

-Sandy

On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov  wrote:

> Yes  i think it is ONE worker ONE executor as executor is nothing but jvm
> instance spawned by the worker
>
> To run more executors ie jvm instances on the same physical cluster node
> you need to run more than one worker on that node and then allocate only
> part of the sys resourced to that worker/executot
>
>
> Sent from Samsung Mobile
>
>
>  Original message 
> From: maxdml
> Date:2015/06/10 19:56 (GMT+00:00)
> To: user@spark.apache.org
> Subject: Re: Determining number of executors within RDD
>
> Actually this is somehow confusing for two reasons:
>
> - First, the option 'spark.executor.instances', which seems to be only
> dealt
> with in the case of YARN in the source code of SparkSubmit.scala, is also
> present in the conf/spark-env.sh file under the standalone section, which
> would indicate that it is also available for this mode
>
> - Second, a post from Andrew Or states that this properties define the
> number of workers in the cluster, not the number of executors on a given
> worker.
> (
> http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html
> )
>
> Could anyone clarify this? :-)
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Determining number of executors within RDD

2015-06-10 Thread Evo Eftimov
Yes  i think it is ONE worker ONE executor as executor is nothing but jvm 
instance spawned by the worker 

To run more executors ie jvm instances on the same physical cluster node you 
need to run more than one worker on that node and then allocate only part of 
the sys resourced to that worker/executot


Sent from Samsung Mobile

 Original message From: maxdml 
 Date:2015/06/10  19:56  (GMT+00:00) 
To: user@spark.apache.org Subject: Re: Determining number 
of executors within RDD 
Actually this is somehow confusing for two reasons:

- First, the option 'spark.executor.instances', which seems to be only dealt
with in the case of YARN in the source code of SparkSubmit.scala, is also
present in the conf/spark-env.sh file under the standalone section, which
would indicate that it is also available for this mode

- Second, a post from Andrew Or states that this properties define the
number of workers in the cluster, not the number of executors on a given
worker.
(http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html)

Could anyone clarify this? :-)

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Determining number of executors within RDD

2015-06-10 Thread maxdml
Actually this is somehow confusing for two reasons:

- First, the option 'spark.executor.instances', which seems to be only dealt
with in the case of YARN in the source code of SparkSubmit.scala, is also
present in the conf/spark-env.sh file under the standalone section, which
would indicate that it is also available for this mode

- Second, a post from Andrew Or states that this properties define the
number of workers in the cluster, not the number of executors on a given
worker.
(http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html)

Could anyone clarify this? :-)

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Determining number of executors within RDD

2015-06-10 Thread maxdml
Note that this property is only available for YARN



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Determining number of executors within RDD

2015-06-10 Thread Himanshu Mehra
Hi Akshat,

I assume what you want is to make sure the number of partitions in your RDD,
which is easily achievable by passing numSlices and minSplits argument at
the time of RDD creation. example :
val someRDD = sc.parallelize(someCollection, numSlices) /
val someRDD = sc.textFile(pathToFile, minSplits)

you can check the number of partition your RDD has by
'someRDD.partitions.size'. And if you want to reduce or increase the number
of partitions you can call 'repartition(numPartition)' method which which
reshuffle the data and partition it in 'numPartition' partitions. 

And of course if you want you can determine the number of executor as well
by setting 'spark.executor.instances' property in 'sparkConf' object.

Thank you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23241.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Determining number of executors within RDD

2015-06-09 Thread maxdml
You should try, from the SparkConf object, to issue a get.

I don't have the exact name for the matching key, but from reading the code
in SparkSubmit.scala, it should be something like:

conf.get("spark.executor.instances")



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23234.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Determining number of executors within RDD

2014-10-01 Thread Akshat Aranya
Hi,

I want implement an RDD wherein the decision of number of partitions is
based on the number of executors that have been set up. Is there some way I
can determine the number of executors within the getPartitions() call?