[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user actuaryzhang closed the pull request at: https://github.com/apache/spark/pull/17161 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104306085 --- Diff: R/pkg/R/DataFrame.R --- @@ -741,12 +724,12 @@ setMethod("coalesce", #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) +#' df <- createDataFrame(mtcars) +#' newDF <- coalesce(df, 1L) --- End diff -- should probably not have coalesce in the example blob for repartition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104306095 --- Diff: R/pkg/R/DataFrame.R --- @@ -548,10 +537,9 @@ setMethod("registerTempTable", #' @examples #'\dontrun{ #' sparkR.session() -#' df <- read.df(path, "parquet") -#' df2 <- read.df(path2, "parquet") -#' createOrReplaceTempView(df, "table1") -#' insertInto(df2, "table1", overwrite = TRUE) +#' df <- limit(createDataFrame(faithful), 5) --- End diff -- why limit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104306091 --- Diff: R/pkg/R/DataFrame.R --- @@ -741,12 +724,12 @@ setMethod("coalesce", #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) +#' df <- createDataFrame(mtcars) +#' newDF <- coalesce(df, 1L) #' newDF <- repartition(df, 2L) #' newDF <- repartition(df, numPartitions = 2L) -#' newDF <- repartition(df, col = df$"col1", df$"col2") -#' newDF <- repartition(df, 3L, col = df$"col1", df$"col2") +#' newDF <- repartition(df, col = df[[1]], df[[2]]) --- End diff -- showing as an example column reference with `$name` is important too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104306047 --- Diff: R/pkg/R/DataFrame.R --- @@ -2805,10 +2779,9 @@ setMethod("except", #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) -#' write.df(df, "myfile", "parquet", "overwrite") -#' saveDF(df, parquetPath2, "parquet", mode = saveMode, mergeSchema = mergeSchema) +#' df <- createDataFrame(mtcars) +#' write.df(df, tempfile(), "parquet", "overwrite") --- End diff -- I think we should avoid having `tempfile()` as output path in example, as that might point users into the wrong direction - anything saved in tempfile will disappear as soon as the R session ends. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104306070 --- Diff: R/pkg/R/DataFrame.R --- @@ -1123,10 +1096,9 @@ setMethod("dim", #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) +#' df <- createDataFrame(mtcars) #' collected <- collect(df) -#' firstName <- collected[[1]]$name +#' collected[[1]] --- End diff -- right, that seems rather unnecessary. any other idea on how to show it is a data.frame? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104284980 --- Diff: R/pkg/R/DataFrame.R --- @@ -92,8 +92,7 @@ dataFrame <- function(sdf, isCached = FALSE) { #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) +#' df <- createDataFrame(mtcars) --- End diff -- `mtcars` is built in to R, like `iris` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17161#discussion_r104283930 --- Diff: R/pkg/R/DataFrame.R --- @@ -92,8 +92,7 @@ dataFrame <- function(sdf, isCached = FALSE) { #' @examples #'\dontrun{ #' sparkR.session() -#' path <- "path/to/file.json" -#' df <- read.json(path) +#' df <- createDataFrame(mtcars) --- End diff -- Should we define `mtcars` to run this example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...
GitHub user actuaryzhang opened a pull request: https://github.com/apache/spark/pull/17161 [SPARK-19819][SparkR] Use concrete data in SparkR DataFrame examples ## What changes were proposed in this pull request? Many examples in SparkDataFrame methods uses: ``` path <- "path/to/file.json" df <- read.json(path) ``` This is not directly runnable. Replace this with real numerical examples so that users can directly execute the examples. You can merge this pull request into a Git repository by running: $ git pull https://github.com/actuaryzhang/spark sparkRDoc2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17161 commit 7e50acc4f270400aa00a728a9487ee1d5c1cc4fc Author: actuaryzhang Date: 2017-03-04T08:45:56Z update dataframe doc with examples commit c4c14adf399bb7b5d272d574113edb7e9ee26d01 Author: actuaryzhang Date: 2017-03-04T08:53:27Z update examples --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org