[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-31 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-107247487
  
I thought I was waiting for a update from @sun-rui, but actually my 
comments have been addressed. There is a minor R documentation change, but I 
can do that with the merge. 

LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6183


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-107001767
  
@shivaram what's happening with this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103401538
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103401522
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103401867
  
  [Test build #33067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/consoleFull)
 for   PR 6183 at commit 
[`dd6f5b3`](https://github.com/apache/spark/commit/dd6f5b3cb5e9ed9cf1300f8cb7e471b996808978).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103432177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103432161
  
  [Test build #33067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33067/consoleFull)
 for   PR 6183 at commit 
[`dd6f5b3`](https://github.com/apache/spark/commit/dd6f5b3cb5e9ed9cf1300f8cb7e471b996808978).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-103432175
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102369758
  
  [Test build #32815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/consoleFull)
 for   PR 6183 at commit 
[`b58771c`](https://github.com/apache/spark/commit/b58771cf9dc5a4d1d70c9820b7dce0275c4edec9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102368971
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102369029
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
GitHub user sun-rui opened a pull request:

https://github.com/apache/spark/pull/6183

[SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sun-rui/spark SPARK-7227

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6183


commit b58771cf9dc5a4d1d70c9820b7dce0275c4edec9
Author: Sun Rui rui@intel.com
Date:   2015-05-15T10:53:11Z

[SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102388855
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102388822
  
  [Test build #32815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/consoleFull)
 for   PR 6183 at commit 
[`b58771c`](https://github.com/apache/spark/commit/b58771cf9dc5a4d1d70c9820b7dce0275c4edec9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102388859
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32815/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102542073
  
@sun-rui one more comment -- lets alias `na.omit` to `dropna` -- That is 
the method used for local R data frames 
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102532406
  
seems fine to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/6183#discussion_r30455256
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1431,3 +1431,119 @@ setMethod(describe,
 sdf - callJMethod(x@sdf, describe, listToSeq(colList))
 dataFrame(sdf)
   })
+
+#' dropna
+#'
+#' Returns a new DataFrame omitting rows with null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param how any or all.
+#'if any, drop a row if it contains any nulls.
+#'if all, drop a row only if all its values are null.
+#'if minNonNulls is specified, how is ignored.
+#' @param minNonNulls If specified, drop rows that have less than
+#'minNonNulls non-null values.
+#'This overwrites the how parameter.
+#' @param cols Optional list of column names to consider.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc - sparkR.init()
+#' sqlCtx - sparkRSQL.init(sc)
+#' path - path/to/file.json
+#' df - jsonFile(sqlCtx, path)
+#' dropna(df)
+#' }
+setMethod(dropna,
+  signature(x = DataFrame),
+  function(x, how = c(any, all), minNonNulls = NULL, cols = 
NULL) {
+how - match.arg(how)
+if (is.null(cols)) {
+  cols - columns(x)
+}
+if (is.null(minNonNulls)) {
+  minNonNulls - if (how == any) { length(cols) } else { 1 }
+}
+
+naFunctions - callJMethod(x@sdf, na)
+sdf - callJMethod(naFunctions, drop,
+   as.integer(minNonNulls), 
listToSeq(as.list(cols)))
+dataFrame(sdf)
+  })
+
+#' fillna
+#'
+#' Replace null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param value Value to replace null values with.
+#'  Should be an integer, numeric, character or named list.
+#'  If the value is a named list, then cols is ignored and
+#'  value must be a mapping from column name (character) to 
+#'  replacement value. The replacement value must be an
+#'  integer, numeric or character.
+#' @param cols optional list of column names to consider.
+#' Columns specified in cols that do not have matching data
+#' type are ignored. For example, if value is a character, and 
+#' subset contains a non-character column, then the 
non-character
+#' column is simply ignored.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc - sparkR.init()
+#' sqlCtx - sparkRSQL.init(sc)
+#' path - path/to/file.json
+#' df - jsonFile(sqlCtx, path)
--- End diff --

OK I will do it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/6183#discussion_r30456797
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -166,6 +166,24 @@ private[spark] object SerDe {
 }
   }
 
+  def readEnv(in: DataInputStream): java.util.Map[Object, Object] = {
--- End diff --

My comment was only about the name for the function. It doesn't matter 
actually as its just an internal function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request:

https://github.com/apache/spark/pull/6183#discussion_r30455447
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -166,6 +166,24 @@ private[spark] object SerDe {
 }
   }
 
+  def readEnv(in: DataInputStream): java.util.Map[Object, Object] = {
--- End diff --

I just want to keep readMap() for the case where in R side an environment 
in which all values are of same type, the value type is serialized once. It is 
optimized than readEnv(). But I am OK if you think we can update readMap() and 
remove readEnv() for now and do such optimization in the future if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/6183#discussion_r30445786
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1431,3 +1431,119 @@ setMethod(describe,
 sdf - callJMethod(x@sdf, describe, listToSeq(colList))
 dataFrame(sdf)
   })
+
+#' dropna
+#'
+#' Returns a new DataFrame omitting rows with null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param how any or all.
+#'if any, drop a row if it contains any nulls.
+#'if all, drop a row only if all its values are null.
+#'if minNonNulls is specified, how is ignored.
+#' @param minNonNulls If specified, drop rows that have less than
+#'minNonNulls non-null values.
+#'This overwrites the how parameter.
+#' @param cols Optional list of column names to consider.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc - sparkR.init()
+#' sqlCtx - sparkRSQL.init(sc)
+#' path - path/to/file.json
+#' df - jsonFile(sqlCtx, path)
+#' dropna(df)
+#' }
+setMethod(dropna,
+  signature(x = DataFrame),
+  function(x, how = c(any, all), minNonNulls = NULL, cols = 
NULL) {
+how - match.arg(how)
+if (is.null(cols)) {
+  cols - columns(x)
+}
+if (is.null(minNonNulls)) {
+  minNonNulls - if (how == any) { length(cols) } else { 1 }
+}
+
+naFunctions - callJMethod(x@sdf, na)
+sdf - callJMethod(naFunctions, drop,
+   as.integer(minNonNulls), 
listToSeq(as.list(cols)))
+dataFrame(sdf)
+  })
+
+#' fillna
+#'
+#' Replace null values.
+#'
+#' @param x A SparkSQL DataFrame.
+#' @param value Value to replace null values with.
+#'  Should be an integer, numeric, character or named list.
+#'  If the value is a named list, then cols is ignored and
+#'  value must be a mapping from column name (character) to 
+#'  replacement value. The replacement value must be an
+#'  integer, numeric or character.
+#' @param cols optional list of column names to consider.
+#' Columns specified in cols that do not have matching data
+#' type are ignored. For example, if value is a character, and 
+#' subset contains a non-character column, then the 
non-character
+#' column is simply ignored.
+#' @return A DataFrame
+#' 
+#' @rdname nafunctions
+#' @export
+#' @examples
+#'\dontrun{
+#' sc - sparkR.init()
+#' sqlCtx - sparkRSQL.init(sc)
+#' path - path/to/file.json
+#' df - jsonFile(sqlCtx, path)
--- End diff --

Could you also add an example here how how the named list usage looks ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/6183#discussion_r30445969
  
--- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala ---
@@ -166,6 +166,24 @@ private[spark] object SerDe {
 }
   }
 
+  def readEnv(in: DataInputStream): java.util.Map[Object, Object] = {
--- End diff --

Minor comment - R env's are restricted to have String keys, so readMap is 
still a good name as this is probably a bit more general than R environments ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/6183#issuecomment-102525286
  
Thanks @sun-rui. This is looking very good ! I had some very minor comments

@rxin could you also take a look at the semantics and make sure it looks ok 
?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org