[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-24 Thread MaxGekk
Github user MaxGekk closed the pull request at:

https://github.com/apache/spark/pull/21589


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r203235747
  
--- Diff: R/pkg/R/context.R ---
@@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment
+#' and potentially available to jobs submitted via the Spark context.
+#'
--- End diff --

and `This method is experimental, and its behavior can be changed in the 
next releases.` too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r203235671
  
--- Diff: R/pkg/R/context.R ---
@@ -437,3 +437,33 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment
+#' and potentially available to jobs submitted via the Spark context.
+#'
--- End diff --

@MaxGekk, is `The number reflects current status of the cluster and can 
change in the future` intentionally taken out here or a mistake?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-15 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202545679
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ---
@@ -67,6 +67,10 @@ private[spark] trait TaskScheduler {
   // Get the default level of parallelism to use in the cluster, as a hint 
for sizing jobs.
   def defaultParallelism(): Int
 
+  def numCores(): Int
--- End diff --

Please also add comment for these functions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202503533
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends 
Logging {
*/
   def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
--- End diff --

that means 
https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/common/tags/src/main/java/org/apache/spark/annotation/Experimental.java


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202503503
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

It's okay for this PR itself for now. I can test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202462678
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

> I think we should really test this.

@HyukjinKwon Unfortunatelly I dont have an opportunity for testing the 
changes on real Yarn cluster (we don't use it in our company). I can try to 
create small one on my laptop but I am not sure this is real testing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202460774
  
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
 """
 return self._jsc.sc().defaultMinPartitions()
 
+@property
+def numCores(self):
+"""
+Total number of CPU cores of all executors registered in the 
cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
+"""
+return self._jsc.sc().numCores()
+
+@property
+def numExecutors(self):
+"""
+Total number of executors registered in the cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
--- End diff --

I added it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202459283
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends 
Logging {
*/
   def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
--- End diff --

> Let's at least leave a @note that this feature is experimental.

What does `experimental` mean for user? unstable? can be changed in the 
future. When I as an user read the note, how should I change my app to take 
into account it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r202454060
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends 
Logging {
*/
   def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
--- End diff --

> why @since 2.4.0 looks conventionally missed here.

Just because other methods in the files don't have. It seems this file 
doesn't follow to common coding style. For example, methods don't end by `()`. 
See, `defaultMinPartitions` instead of `defaultMinPartitions()`.

I will add `@since` and `@note`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201973111
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note spark.numCores since 2.4.0
+spark.numCores <- function() {
--- End diff --

I think you should add those into `R/pkg/NAMESPACE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201968413
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

I tested this in Yarn cluster now. In a higher level to user (regardless of 
the details above), I think it's right to say cores assigned to the application 
in general. Let's clarify this API is experimental.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201920416
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

Hm, yea. I think that's a valid point. It will be 100 in this case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201914125
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note spark.numCores since 2.4.0
+spark.numCores <- function() {
--- End diff --

still need to add these in 
https://github.com/MaxGekk/spark-1/blob/a39695e059c1a2976be50159e33144ee453d3c2f/R/pkg/NAMESPACE#L433


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201913817
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

btw, `in this cluster` do we really mean cores allocated to the 
"application" or "job"? it's not really in the cluster right? If I'm running 
this app on Hadoop/YARN with 1000s of core, but only set aside 100 for this 
app, which number am I getting from this API?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201913578
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note spark.numCores since 2.4.0
+spark.numCores <- function() {
+  sc <- getSparkContext()
+  invisible(callJMethod(sc, "numCores"))
--- End diff --

`invisible` in other cases are because they return void in java, in these 
cases we are returning numbers, and should not use `invisible`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887843
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends 
Logging {
*/
   def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
--- End diff --

Let's at least leave a `@note` that this feature is experimental.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887513
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala ---
@@ -128,6 +128,18 @@ class JavaSparkContext(val sc: SparkContext)
   /** Default min number of partitions for Hadoop RDDs when not given by 
user */
   def defaultMinPartitions: java.lang.Integer = sc.defaultMinPartitions
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
+  def numCores: java.lang.Integer = sc.numCores
--- End diff --

ditto for `@since 2.4.0`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887443
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends 
Logging {
*/
   def defaultMinPartitions: Int = math.min(defaultParallelism, 2)
 
+  /**
+   * Total number of CPU cores of all executors registered in the cluster 
at the moment.
+   * The number reflects current status of the cluster and can change in 
the future.
+   */
--- End diff --

Likewise, I am less sure why ` @since 2.4.0` looks conventionally missed 
here. Let's add it here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887249
  
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
--- End diff --

`The number reflects current status of the cluster and can change in the 
future.` this info looks missing here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887142
  
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
 """
 return self._jsc.sc().defaultMinPartitions()
 
+@property
+def numCores(self):
+"""
+Total number of CPU cores of all executors registered in the 
cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
--- End diff --

We call it `.. note:: Experimental` though I believe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r201887215
  
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
 """
 return self._jsc.sc().defaultMinPartitions()
 
+@property
+def numCores(self):
+"""
+Total number of CPU cores of all executors registered in the 
cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
+"""
+return self._jsc.sc().numCores()
+
+@property
+def numExecutors(self):
+"""
+Total number of executors registered in the cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
--- End diff --

Shell we just manually add `.. versionadded:: 2.4.0`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-28 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198800988
  
--- Diff: R/pkg/R/context.R ---
@@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note numCores since 2.4.0
+spark.numCores <- function() {
--- End diff --

If you don't mind, I leave the functions in the `spark` namespace.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198723017
  
--- Diff: R/pkg/R/context.R ---
@@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note numCores since 2.4.0
+spark.numCores <- function() {
+  sc <- getSparkContext()
+  invisible(callJMethod(sc, "numCores"))
+}
+
+#' Total number of executors registered in the cluster at the moment.
+#'
+#' @rdname spark.numExecutors
+#' @return current number of executors in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numExecutors()
+#'}
+#' @note numExecutors since 2.4.0
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198722933
  
--- Diff: R/pkg/R/context.R ---
@@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note numCores since 2.4.0
+spark.numCores <- function() {
--- End diff --

you don't have to call it spark.*
i'm not sure what names is better, spark.numCores or numCores.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198722972
  
--- Diff: R/pkg/R/context.R ---
@@ -451,3 +435,31 @@ setCheckpointDir <- function(directory) {
   sc <- getSparkContext()
   invisible(callJMethod(sc, "setCheckpointDir", 
suppressWarnings(normalizePath(directory
 }
+
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @rdname spark.numCores
+#' @return current number of cores in the cluster.
+#' @examples
+#'\dontrun{
+#' spark.numCores()
+#'}
+#' @note numCores since 2.4.0
--- End diff --

this needs to match the function name


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198708461
  
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
 """
 return self._jsc.sc().defaultMinPartitions()
 
+@property
+def numCores(self):
+"""
+Total number of CPU cores of all executors registered in the 
cluster at the moment.
+The number reflects current status of the cluster and can change 
in the future.
+"""
--- End diff --

Let's add a version information here too. It should have added versions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198590931
  
--- Diff: R/pkg/R/context.R ---
@@ -25,6 +25,22 @@ getMinPartitions <- function(sc, minPartitions) {
   as.integer(minPartitions)
 }
 
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @param sc SparkContext to use
+#' @return current number of cores in the cluster.
+numCores <- function(sc) {
+  callJMethod(sc, "numCores")
+}
+
+#' Total number of executors registered in the cluster at the moment.
+#'
+#' @param sc SparkContext to use
+#' @return current number of executors in the cluster.
+numExecutors <- function(sc) {
+  callJMethod(sc, "numExecutors")
+}
+
--- End diff --

Thank you for pointing me out the example of `spark.addFile`. I changed 
`spark.numCores` and `spark.numExecutors` in the same way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-26 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198219194
  
--- Diff: R/pkg/R/context.R ---
@@ -25,6 +25,22 @@ getMinPartitions <- function(sc, minPartitions) {
   as.integer(minPartitions)
 }
 
+#' Total number of CPU cores of all executors registered in the cluster at 
the moment.
+#'
+#' @param sc SparkContext to use
+#' @return current number of cores in the cluster.
+numCores <- function(sc) {
+  callJMethod(sc, "numCores")
+}
+
+#' Total number of executors registered in the cluster at the moment.
+#'
+#' @param sc SparkContext to use
+#' @return current number of executors in the cluster.
+numExecutors <- function(sc) {
+  callJMethod(sc, "numExecutors")
+}
+
--- End diff --

actually, all sparkContext methods (ie. parameter has `sc`) are 
internal/non-public/deprecated. 
see `spark.addFile`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-26 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r198174777
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala
 ---
@@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend(
 
   // TODO: query Mesos for number of cores
   override def defaultParallelism(): Int = 
sc.conf.getInt("spark.default.parallelism", 8)
+  override def numCores(): Int = defaultParallelism
+  override def numExecutors(): Int = 1
--- End diff --

The mode (and the file) is going to be removed in Spark 3.0: 
https://github.com/apache/spark/pull/18784


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-25 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r197725870
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala
 ---
@@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend(
 
   // TODO: query Mesos for number of cores
   override def defaultParallelism(): Int = 
sc.conf.getInt("spark.default.parallelism", 8)
+  override def numCores(): Int = defaultParallelism
+  override def numExecutors(): Int = 1
--- End diff --

I added comments to the ticket above.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-21 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r197120918
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala
 ---
@@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend(
 
   // TODO: query Mesos for number of cores
   override def defaultParallelism(): Int = 
sc.conf.getInt("spark.default.parallelism", 8)
+  override def numCores(): Int = defaultParallelism
+  override def numExecutors(): Int = 1
--- End diff --

I found this ticket for the TODO: 
https://issues.apache.org/jira/browse/SPARK-9775 . Should I add a comment to it 
or create a separate ticket?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21589#discussion_r197107564
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala
 ---
@@ -446,6 +446,8 @@ private[spark] class MesosFineGrainedSchedulerBackend(
 
   // TODO: query Mesos for number of cores
   override def defaultParallelism(): Int = 
sc.conf.getInt("spark.default.parallelism", 8)
+  override def numCores(): Int = defaultParallelism
+  override def numExecutors(): Int = 1
--- End diff --

Eh, is it simply because it's not implemented yet in Mesos? I am not used 
to it but I thought it's better be mentioned somewhere.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-06-18 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/21589

[SPARK-24591][CORE] Number of cores and executors in the cluster

## What changes were proposed in this pull request?

In the PR, I propose to extend `SparkContext` by:

1. `def numCores: Int` returns total number of CPU cores of all executors 
registered in the cluster at the moment. Main use case for that is using it in 
_repartition()_ and _coalesce()_.

2. `def numExecutors: Int` returns total number of executors registered in 
the cluster at the moment. Some jobs, e.g., local node ML training, use a lot 
of parallelism. It's a common practice to aim to distribute such jobs such that 
there is one partition for each executor. 

## How was this patch tested?

- R API was tested manually from `sparkR`
- Added tests fro PySpark and `JavaSparkContext` that test number of cores 
and executors in `local` mode.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 num-cores-and-executors

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21589.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21589


commit 9d44d7d4d86e8549cc4e524a7ea3d818b41084f2
Author: Maxim Gekk 
Date:   2018-06-16T18:14:42Z

Methods returns total number of cores and executors in the cluster

commit c6b354c466677c1101b30fc1b25ddc5750c8eaf6
Author: Maxim Gekk 
Date:   2018-06-16T18:19:09Z

Update Java's Spark Context

commit 54f04369c0f3329e8c27ad405a350ee20b788b21
Author: Maxim Gekk 
Date:   2018-06-16T18:57:08Z

Tests for number of cores and executors in the local mode

commit 4d645829c8d338451be81c4554cc1257b459f6a6
Author: Maxim Gekk 
Date:   2018-06-16T20:09:35Z

Adding coresCount and executorsCount to PySpark

commit 79633d9a3e7aebf40ee8940e8fcf00d43dc22ed7
Author: Maxim Gekk 
Date:   2018-06-18T13:08:54Z

Improving comments

commit d7e94e10794964022d3dc98671b86f02af80d2e8
Author: Maxim Gekk 
Date:   2018-06-18T13:31:48Z

Renaming of the methods

commit 9be566f1ed3e066de7e3d3ad557756d22fc22a73
Author: Maxim Gekk 
Date:   2018-06-18T18:41:06Z

New methods for SparkR




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org