Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16668#discussion_r97473647 --- Diff: R/pkg/R/DataFrame.R --- @@ -3406,3 +3406,28 @@ setMethod("randomSplit", } sapply(sdfs, dataFrame) }) + +#' getNumPartitions +#' +#' Return the number of partitions +#' Note: in order to compute the number of partition the SparkDataFrame has to be converted into a +#' RDD temporarily internally. +#' +#' @param x A SparkDataFrame +#' @family SparkDataFrame functions +#' @aliases getNumPartitions,SparkDataFrame-method +#' @rdname getNumPartitions +#' @name getNumPartitions +#' @export +#' @examples +#'\dontrun{ +#' sparkR.session() +#' df <- createDataFrame(cars, numPartitions = 2) +#' getNumPartitions(df) +#' } +#' @note getNumPartitions since 2.1.1 +setMethod("getNumPartitions", + signature(x = "SparkDataFrame"), + function(x) { + getNumPartitionsRDD(toRDD(x)) --- End diff -- Give this is a bit of a hole I think it would be worthwhile to think if there is a reasonable workaround for 2.1.1 release (say JVM wrapper for `.rdd.getNumPartitions`), @shivaram would you agree? As for the new Scala API, since it has broader implications it might be something to target the 2.2 release? If so that would be better served in a different PR. I don't mind taking a shot at that - I'm not super familiar with that and from a quick scan it seems to be non-trivial (to handle different RDD subtypes and so on), so a few pointers would be appreciated, @cloud-fan
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org