Re: Determining number of executors within RDD
This PR adds support for multiple executors per worker: https://github.com/apache/spark/pull/731 and should be available in 1.4. Thanks, Nishkam On Wed, Jun 10, 2015 at 1:35 PM, Evo Eftimov wrote: > We/i were discussing STANDALONE mode, besides maxdml had already > summarized what is available and possible under yarn > > So let me recap - for standalone mode if you need more than 1 executor per > physical host e.g. to partition its sys resources more finley (especialy > RAM per jvm instance) you need to got for what is essentialy a bit of a > hack ie runn8ng more than 1 workers per machine > > > Sent from Samsung Mobile > > > Original message > From: Sandy Ryza > Date:2015/06/10 21:31 (GMT+00:00) > To: Evo Eftimov > Cc: maxdml ,user@spark.apache.org > Subject: Re: Determining number of executors within RDD > > On YARN, there is no concept of a Spark Worker. Multiple executors will > be run per node without any effort required by the user, as long as all the > executors fit within each node's resource limits. > > -Sandy > > On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov > wrote: > >> Yes i think it is ONE worker ONE executor as executor is nothing but jvm >> instance spawned by the worker >> >> To run more executors ie jvm instances on the same physical cluster node >> you need to run more than one worker on that node and then allocate only >> part of the sys resourced to that worker/executot >> >> >> Sent from Samsung Mobile >> >> >> ---- Original message >> From: maxdml >> Date:2015/06/10 19:56 (GMT+00:00) >> To: user@spark.apache.org >> Subject: Re: Determining number of executors within RDD >> >> Actually this is somehow confusing for two reasons: >> >> - First, the option 'spark.executor.instances', which seems to be only >> dealt >> with in the case of YARN in the source code of SparkSubmit.scala, is also >> present in the conf/spark-env.sh file under the standalone section, which >> would indicate that it is also available for this mode >> >> - Second, a post from Andrew Or states that this properties define the >> number of workers in the cluster, not the number of executors on a given >> worker. >> ( >> http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html >> ) >> >> Could anyone clarify this? :-) >> >> Thanks. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Determining number of executors within RDD
We/i were discussing STANDALONE mode, besides maxdml had already summarized what is available and possible under yarn So let me recap - for standalone mode if you need more than 1 executor per physical host e.g. to partition its sys resources more finley (especialy RAM per jvm instance) you need to got for what is essentialy a bit of a hack ie runn8ng more than 1 workers per machine Sent from Samsung Mobile Original message From: Sandy Ryza Date:2015/06/10 21:31 (GMT+00:00) To: Evo Eftimov Cc: maxdml ,user@spark.apache.org Subject: Re: Determining number of executors within RDD On YARN, there is no concept of a Spark Worker. Multiple executors will be run per node without any effort required by the user, as long as all the executors fit within each node's resource limits. -Sandy On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov wrote: Yes i think it is ONE worker ONE executor as executor is nothing but jvm instance spawned by the worker To run more executors ie jvm instances on the same physical cluster node you need to run more than one worker on that node and then allocate only part of the sys resourced to that worker/executot Sent from Samsung Mobile Original message From: maxdml Date:2015/06/10 19:56 (GMT+00:00) To: user@spark.apache.org Subject: Re: Determining number of executors within RDD Actually this is somehow confusing for two reasons: - First, the option 'spark.executor.instances', which seems to be only dealt with in the case of YARN in the source code of SparkSubmit.scala, is also present in the conf/spark-env.sh file under the standalone section, which would indicate that it is also available for this mode - Second, a post from Andrew Or states that this properties define the number of workers in the cluster, not the number of executors on a given worker. (http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html) Could anyone clarify this? :-) Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Determining number of executors within RDD
On YARN, there is no concept of a Spark Worker. Multiple executors will be run per node without any effort required by the user, as long as all the executors fit within each node's resource limits. -Sandy On Wed, Jun 10, 2015 at 3:24 PM, Evo Eftimov wrote: > Yes i think it is ONE worker ONE executor as executor is nothing but jvm > instance spawned by the worker > > To run more executors ie jvm instances on the same physical cluster node > you need to run more than one worker on that node and then allocate only > part of the sys resourced to that worker/executot > > > Sent from Samsung Mobile > > > Original message > From: maxdml > Date:2015/06/10 19:56 (GMT+00:00) > To: user@spark.apache.org > Subject: Re: Determining number of executors within RDD > > Actually this is somehow confusing for two reasons: > > - First, the option 'spark.executor.instances', which seems to be only > dealt > with in the case of YARN in the source code of SparkSubmit.scala, is also > present in the conf/spark-env.sh file under the standalone section, which > would indicate that it is also available for this mode > > - Second, a post from Andrew Or states that this properties define the > number of workers in the cluster, not the number of executors on a given > worker. > ( > http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html > ) > > Could anyone clarify this? :-) > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Determining number of executors within RDD
Yes i think it is ONE worker ONE executor as executor is nothing but jvm instance spawned by the worker To run more executors ie jvm instances on the same physical cluster node you need to run more than one worker on that node and then allocate only part of the sys resourced to that worker/executot Sent from Samsung Mobile Original message From: maxdml Date:2015/06/10 19:56 (GMT+00:00) To: user@spark.apache.org Subject: Re: Determining number of executors within RDD Actually this is somehow confusing for two reasons: - First, the option 'spark.executor.instances', which seems to be only dealt with in the case of YARN in the source code of SparkSubmit.scala, is also present in the conf/spark-env.sh file under the standalone section, which would indicate that it is also available for this mode - Second, a post from Andrew Or states that this properties define the number of workers in the cluster, not the number of executors on a given worker. (http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html) Could anyone clarify this? :-) Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Determining number of executors within RDD
Actually this is somehow confusing for two reasons: - First, the option 'spark.executor.instances', which seems to be only dealt with in the case of YARN in the source code of SparkSubmit.scala, is also present in the conf/spark-env.sh file under the standalone section, which would indicate that it is also available for this mode - Second, a post from Andrew Or states that this properties define the number of workers in the cluster, not the number of executors on a given worker. (http://apache-spark-user-list.1001560.n3.nabble.com/clarification-for-some-spark-on-yarn-configuration-options-td13692.html) Could anyone clarify this? :-) Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23262.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Determining number of executors within RDD
Note that this property is only available for YARN -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23256.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Determining number of executors within RDD
Hi Akshat, I assume what you want is to make sure the number of partitions in your RDD, which is easily achievable by passing numSlices and minSplits argument at the time of RDD creation. example : val someRDD = sc.parallelize(someCollection, numSlices) / val someRDD = sc.textFile(pathToFile, minSplits) you can check the number of partition your RDD has by 'someRDD.partitions.size'. And if you want to reduce or increase the number of partitions you can call 'repartition(numPartition)' method which which reshuffle the data and partition it in 'numPartition' partitions. And of course if you want you can determine the number of executor as well by setting 'spark.executor.instances' property in 'sparkConf' object. Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23241.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Determining number of executors within RDD
You should try, from the SparkConf object, to issue a get. I don't have the exact name for the matching key, but from reading the code in SparkSubmit.scala, it should be something like: conf.get("spark.executor.instances") -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23234.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Determining number of executors within RDD
Hi, I want implement an RDD wherein the decision of number of partitions is based on the number of executors that have been set up. Is there some way I can determine the number of executors within the getPartitions() call?