[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-107247487 I thought I was waiting for a update from @sun-rui, but actually my comments have been addressed. There is a minor R documentation change, but I can do that with the merge. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6183 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-107001767 @shivaram what's happening with this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103401538 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103401522 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103401867 [Test build #33067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/consoleFull) for PR 6183 at commit [`dd6f5b3`](https://github.com/apache/spark/commit/dd6f5b3cb5e9ed9cf1300f8cb7e471b996808978). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103432177 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103432161 [Test build #33067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/consoleFull) for PR 6183 at commit [`dd6f5b3`](https://github.com/apache/spark/commit/dd6f5b3cb5e9ed9cf1300f8cb7e471b996808978). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-103432175 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102369758 [Test build #32815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/consoleFull) for PR 6183 at commit [`b58771c`](https://github.com/apache/spark/commit/b58771cf9dc5a4d1d70c9820b7dce0275c4edec9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102368971 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102369029 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/6183 [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-7227 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6183.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6183 commit b58771cf9dc5a4d1d70c9820b7dce0275c4edec9 Author: Sun Rui rui@intel.com Date: 2015-05-15T10:53:11Z [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102388855 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102388822 [Test build #32815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/consoleFull) for PR 6183 at commit [`b58771c`](https://github.com/apache/spark/commit/b58771cf9dc5a4d1d70c9820b7dce0275c4edec9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102388859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102542073 @sun-rui one more comment -- lets alias `na.omit` to `dropna` -- That is the method used for local R data frames https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102532406 seems fine to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30455256 --- Diff: R/pkg/R/DataFrame.R --- @@ -1431,3 +1431,119 @@ setMethod(describe, sdf - callJMethod(x@sdf, describe, listToSeq(colList)) dataFrame(sdf) }) + +#' dropna +#' +#' Returns a new DataFrame omitting rows with null values. +#' +#' @param x A SparkSQL DataFrame. +#' @param how any or all. +#'if any, drop a row if it contains any nulls. +#'if all, drop a row only if all its values are null. +#'if minNonNulls is specified, how is ignored. +#' @param minNonNulls If specified, drop rows that have less than +#'minNonNulls non-null values. +#'This overwrites the how parameter. +#' @param cols Optional list of column names to consider. +#' @return A DataFrame +#' +#' @rdname nafunctions +#' @export +#' @examples +#'\dontrun{ +#' sc - sparkR.init() +#' sqlCtx - sparkRSQL.init(sc) +#' path - path/to/file.json +#' df - jsonFile(sqlCtx, path) +#' dropna(df) +#' } +setMethod(dropna, + signature(x = DataFrame), + function(x, how = c(any, all), minNonNulls = NULL, cols = NULL) { +how - match.arg(how) +if (is.null(cols)) { + cols - columns(x) +} +if (is.null(minNonNulls)) { + minNonNulls - if (how == any) { length(cols) } else { 1 } +} + +naFunctions - callJMethod(x@sdf, na) +sdf - callJMethod(naFunctions, drop, + as.integer(minNonNulls), listToSeq(as.list(cols))) +dataFrame(sdf) + }) + +#' fillna +#' +#' Replace null values. +#' +#' @param x A SparkSQL DataFrame. +#' @param value Value to replace null values with. +#' Should be an integer, numeric, character or named list. +#' If the value is a named list, then cols is ignored and +#' value must be a mapping from column name (character) to +#' replacement value. The replacement value must be an +#' integer, numeric or character. +#' @param cols optional list of column names to consider. +#' Columns specified in cols that do not have matching data +#' type are ignored. For example, if value is a character, and +#' subset contains a non-character column, then the non-character +#' column is simply ignored. +#' @return A DataFrame +#' +#' @rdname nafunctions +#' @export +#' @examples +#'\dontrun{ +#' sc - sparkR.init() +#' sqlCtx - sparkRSQL.init(sc) +#' path - path/to/file.json +#' df - jsonFile(sqlCtx, path) --- End diff -- OK I will do it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30456797 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -166,6 +166,24 @@ private[spark] object SerDe { } } + def readEnv(in: DataInputStream): java.util.Map[Object, Object] = { --- End diff -- My comment was only about the name for the function. It doesn't matter actually as its just an internal function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30455447 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -166,6 +166,24 @@ private[spark] object SerDe { } } + def readEnv(in: DataInputStream): java.util.Map[Object, Object] = { --- End diff -- I just want to keep readMap() for the case where in R side an environment in which all values are of same type, the value type is serialized once. It is optimized than readEnv(). But I am OK if you think we can update readMap() and remove readEnv() for now and do such optimization in the future if necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30445786 --- Diff: R/pkg/R/DataFrame.R --- @@ -1431,3 +1431,119 @@ setMethod(describe, sdf - callJMethod(x@sdf, describe, listToSeq(colList)) dataFrame(sdf) }) + +#' dropna +#' +#' Returns a new DataFrame omitting rows with null values. +#' +#' @param x A SparkSQL DataFrame. +#' @param how any or all. +#'if any, drop a row if it contains any nulls. +#'if all, drop a row only if all its values are null. +#'if minNonNulls is specified, how is ignored. +#' @param minNonNulls If specified, drop rows that have less than +#'minNonNulls non-null values. +#'This overwrites the how parameter. +#' @param cols Optional list of column names to consider. +#' @return A DataFrame +#' +#' @rdname nafunctions +#' @export +#' @examples +#'\dontrun{ +#' sc - sparkR.init() +#' sqlCtx - sparkRSQL.init(sc) +#' path - path/to/file.json +#' df - jsonFile(sqlCtx, path) +#' dropna(df) +#' } +setMethod(dropna, + signature(x = DataFrame), + function(x, how = c(any, all), minNonNulls = NULL, cols = NULL) { +how - match.arg(how) +if (is.null(cols)) { + cols - columns(x) +} +if (is.null(minNonNulls)) { + minNonNulls - if (how == any) { length(cols) } else { 1 } +} + +naFunctions - callJMethod(x@sdf, na) +sdf - callJMethod(naFunctions, drop, + as.integer(minNonNulls), listToSeq(as.list(cols))) +dataFrame(sdf) + }) + +#' fillna +#' +#' Replace null values. +#' +#' @param x A SparkSQL DataFrame. +#' @param value Value to replace null values with. +#' Should be an integer, numeric, character or named list. +#' If the value is a named list, then cols is ignored and +#' value must be a mapping from column name (character) to +#' replacement value. The replacement value must be an +#' integer, numeric or character. +#' @param cols optional list of column names to consider. +#' Columns specified in cols that do not have matching data +#' type are ignored. For example, if value is a character, and +#' subset contains a non-character column, then the non-character +#' column is simply ignored. +#' @return A DataFrame +#' +#' @rdname nafunctions +#' @export +#' @examples +#'\dontrun{ +#' sc - sparkR.init() +#' sqlCtx - sparkRSQL.init(sc) +#' path - path/to/file.json +#' df - jsonFile(sqlCtx, path) --- End diff -- Could you also add an example here how how the named list usage looks ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30445969 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -166,6 +166,24 @@ private[spark] object SerDe { } } + def readEnv(in: DataInputStream): java.util.Map[Object, Object] = { --- End diff -- Minor comment - R env's are restricted to have String keys, so readMap is still a good name as this is probably a bit more general than R environments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/6183#issuecomment-102525286 Thanks @sun-rui. This is looking very good ! I had some very minor comments @rxin could you also take a look at the semantics and make sure it looks ok ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org