[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/21589 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r203235747 --- Diff: R/pkg/R/context.R --- @@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment +#' and potentially available to jobs submitted via the Spark context. +#' --- End diff -- and `This method is experimental, and its behavior can be changed in the next releases.` too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r203235671 --- Diff: R/pkg/R/context.R --- @@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment +#' and potentially available to jobs submitted via the Spark context. +#' --- End diff -- @MaxGekk, is `The number reflects current status of the cluster and can change in the future` intentionally taken out here or a mistake? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202545679 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala --- @@ -67,6 +67,10 @@ private[spark] trait TaskScheduler { // Get the default level of parallelism to use in the cluster, as a hint for sizing jobs. def defaultParallelism(): Int + def numCores(): Int --- End diff -- Please also add comment for these functions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202503533 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- that means https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/common/tags/src/main/java/org/apache/spark/annotation/Experimental.java --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202503503 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- It's okay for this PR itself for now. I can test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202462678 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- > I think we should really test this. @HyukjinKwon Unfortunatelly I dont have an opportunity for testing the changes on real Yarn cluster (we don't use it in our company). I can try to create small one on my laptop but I am not sure this is real testing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202460774 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinPartitions() +@property +def numCores(self): +""" +Total number of CPU cores of all executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. +""" +return self._jsc.sc().numCores() + +@property +def numExecutors(self): +""" +Total number of executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. --- End diff -- I added it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202459283 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- > Let's at least leave a @note that this feature is experimental. What does `experimental` mean for user? unstable? can be changed in the future. When I as an user read the note, how should I change my app to take into account it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202454060 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- > why @since 2.4.0 looks conventionally missed here. Just because other methods in the files don't have. It seems this file doesn't follow to common coding style. For example, methods don't end by `()`. See, `defaultMinPartitions` instead of `defaultMinPartitions()`. I will add `@since` and `@note` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201973111 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note spark.numCores since 2.4.0 +spark.numCores <- function() { --- End diff -- I think you should add those into `R/pkg/NAMESPACE`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201968413 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- I tested this in Yarn cluster now. In a higher level to user (regardless of the details above), I think it's right to say cores assigned to the application in general. Let's clarify this API is experimental. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201920416 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- Hm, yea. I think that's a valid point. It will be 100 in this case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201914125 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note spark.numCores since 2.4.0 +spark.numCores <- function() { --- End diff -- still need to add these in https://github.com/MaxGekk/spark-1/blob/a39695e059c1a2976be50159e33144ee453d3c2f/R/pkg/NAMESPACE#L433 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201913817 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- btw, `in this cluster` do we really mean cores allocated to the "application" or "job"? it's not really in the cluster right? If I'm running this app on Hadoop/YARN with 1000s of core, but only set aside 100 for this app, which number am I getting from this API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201913578 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note spark.numCores since 2.4.0 +spark.numCores <- function() { + sc <- getSparkContext() + invisible(callJMethod(sc, "numCores")) --- End diff -- `invisible` in other cases are because they return void in java, in these cases we are returning numbers, and should not use `invisible` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887843 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- Let's at least leave a `@note` that this feature is experimental. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887513 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala --- @@ -128,6 +128,18 @@ class JavaSparkContext(val sc: SparkContext) /** Default min number of partitions for Hadoop RDDs when not given by user */ def defaultMinPartitions: java.lang.Integer = sc.defaultMinPartitions + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ + def numCores: java.lang.Integer = sc.numCores --- End diff -- ditto for `@since 2.4.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887443 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- Likewise, I am less sure why ` @since 2.4.0` looks conventionally missed here. Let's add it here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887249 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- `The number reflects current status of the cluster and can change in the future.` this info looks missing here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887142 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinPartitions() +@property +def numCores(self): +""" +Total number of CPU cores of all executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. --- End diff -- We call it `.. note:: Experimental` though I believe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r201887215 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinPartitions() +@property +def numCores(self): +""" +Total number of CPU cores of all executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. +""" +return self._jsc.sc().numCores() + +@property +def numExecutors(self): +""" +Total number of executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. --- End diff -- Shell we just manually add `.. versionadded:: 2.4.0`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198800988 --- Diff: R/pkg/R/context.R --- @@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note numCores since 2.4.0 +spark.numCores <- function() { --- End diff -- If you don't mind, I leave the functions in the `spark` namespace. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198723017 --- Diff: R/pkg/R/context.R --- @@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note numCores since 2.4.0 +spark.numCores <- function() { + sc <- getSparkContext() + invisible(callJMethod(sc, "numCores")) +} + +#' Total number of executors registered in the cluster at the moment. +#' +#' @rdname spark.numExecutors +#' @return current number of executors in the cluster. +#' @examples +#'\dontrun{ +#' spark.numExecutors() +#'} +#' @note numExecutors since 2.4.0 --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198722933 --- Diff: R/pkg/R/context.R --- @@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note numCores since 2.4.0 +spark.numCores <- function() { --- End diff -- you don't have to call it spark.* i'm not sure what names is better, spark.numCores or numCores. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198722972 --- Diff: R/pkg/R/context.R --- @@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @rdname spark.numCores +#' @return current number of cores in the cluster. +#' @examples +#'\dontrun{ +#' spark.numCores() +#'} +#' @note numCores since 2.4.0 --- End diff -- this needs to match the function name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198708461 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ return self._jsc.sc().defaultMinPartitions() +@property +def numCores(self): +""" +Total number of CPU cores of all executors registered in the cluster at the moment. +The number reflects current status of the cluster and can change in the future. +""" --- End diff -- Let's add a version information here too. It should have added versions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198590931 --- Diff: R/pkg/R/context.R --- @@ -25,6 +25,22 @@ getMinPartitions <- function(sc, minPartitions) { as.integer(minPartitions) } +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @param sc SparkContext to use +#' @return current number of cores in the cluster. +numCores <- function(sc) { + callJMethod(sc, "numCores") +} + +#' Total number of executors registered in the cluster at the moment. +#' +#' @param sc SparkContext to use +#' @return current number of executors in the cluster. +numExecutors <- function(sc) { + callJMethod(sc, "numExecutors") +} + --- End diff -- Thank you for pointing me out the example of `spark.addFile`. I changed `spark.numCores` and `spark.numExecutors` in the same way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198219194 --- Diff: R/pkg/R/context.R --- @@ -25,6 +25,22 @@ getMinPartitions <- function(sc, minPartitions) { as.integer(minPartitions) } +#' Total number of CPU cores of all executors registered in the cluster at the moment. +#' +#' @param sc SparkContext to use +#' @return current number of cores in the cluster. +numCores <- function(sc) { + callJMethod(sc, "numCores") +} + +#' Total number of executors registered in the cluster at the moment. +#' +#' @param sc SparkContext to use +#' @return current number of executors in the cluster. +numExecutors <- function(sc) { + callJMethod(sc, "numExecutors") +} + --- End diff -- actually, all sparkContext methods (ie. parameter has `sc`) are internal/non-public/deprecated. see `spark.addFile` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r198174777 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( // TODO: query Mesos for number of cores override def defaultParallelism(): Int = sc.conf.getInt("spark.default.parallelism", 8) + override def numCores(): Int = defaultParallelism + override def numExecutors(): Int = 1 --- End diff -- The mode (and the file) is going to be removed in Spark 3.0: https://github.com/apache/spark/pull/18784 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r197725870 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( // TODO: query Mesos for number of cores override def defaultParallelism(): Int = sc.conf.getInt("spark.default.parallelism", 8) + override def numCores(): Int = defaultParallelism + override def numExecutors(): Int = 1 --- End diff -- I added comments to the ticket above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r197120918 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( // TODO: query Mesos for number of cores override def defaultParallelism(): Int = sc.conf.getInt("spark.default.parallelism", 8) + override def numCores(): Int = defaultParallelism + override def numExecutors(): Int = 1 --- End diff -- I found this ticket for the TODO: https://issues.apache.org/jira/browse/SPARK-9775 . Should I add a comment to it or create a separate ticket? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r197107564 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala --- @@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend( // TODO: query Mesos for number of cores override def defaultParallelism(): Int = sc.conf.getInt("spark.default.parallelism", 8) + override def numCores(): Int = defaultParallelism + override def numExecutors(): Int = 1 --- End diff -- Eh, is it simply because it's not implemented yet in Mesos? I am not used to it but I thought it's better be mentioned somewhere. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21589 [SPARK-24591][CORE] Number of cores and executors in the cluster ## What changes were proposed in this pull request? In the PR, I propose to extend `SparkContext` by: 1. `def numCores: Int` returns total number of CPU cores of all executors registered in the cluster at the moment. Main use case for that is using it in _repartition()_ and _coalesce()_. 2. `def numExecutors: Int` returns total number of executors registered in the cluster at the moment. Some jobs, e.g., local node ML training, use a lot of parallelism. It's a common practice to aim to distribute such jobs such that there is one partition for each executor. ## How was this patch tested? - R API was tested manually from `sparkR` - Added tests fro PySpark and `JavaSparkContext` that test number of cores and executors in `local` mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 num-cores-and-executors Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21589.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21589 commit 9d44d7d4d86e8549cc4e524a7ea3d818b41084f2 Author: Maxim Gekk Date: 2018-06-16T18:14:42Z Methods returns total number of cores and executors in the cluster commit c6b354c466677c1101b30fc1b25ddc5750c8eaf6 Author: Maxim Gekk Date: 2018-06-16T18:19:09Z Update Java's Spark Context commit 54f04369c0f3329e8c27ad405a350ee20b788b21 Author: Maxim Gekk Date: 2018-06-16T18:57:08Z Tests for number of cores and executors in the local mode commit 4d645829c8d338451be81c4554cc1257b459f6a6 Author: Maxim Gekk Date: 2018-06-16T20:09:35Z Adding coresCount and executorsCount to PySpark commit 79633d9a3e7aebf40ee8940e8fcf00d43dc22ed7 Author: Maxim Gekk Date: 2018-06-18T13:08:54Z Improving comments commit d7e94e10794964022d3dc98671b86f02af80d2e8 Author: Maxim Gekk Date: 2018-06-18T13:31:48Z Renaming of the methods commit 9be566f1ed3e066de7e3d3ad557756d22fc22a73 Author: Maxim Gekk Date: 2018-06-18T18:41:06Z New methods for SparkR --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org