[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17825


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-07 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r115152814
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3745,3 +3745,26 @@ setMethod("hint",
 jdf <- callJMethod(x@sdf, "hint", name, parameters)
 dataFrame(jdf)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @family SparkDataFrame functions
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

true, it's more for tracking it manually


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-05 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r115113723
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3745,3 +3745,26 @@ setMethod("hint",
 jdf <- callJMethod(x@sdf, "hint", name, parameters)
 dataFrame(jdf)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @family SparkDataFrame functions
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

Done, but do we actually need this? We don't use roxygen to maintain 
`NAMESPACE`, and (I believe i mentioned this before) we `@export` objects which 
are not really exported. Just saying...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r115113346
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3745,3 +3745,26 @@ setMethod("hint",
 jdf <- callJMethod(x@sdf, "hint", name, parameters)
 dataFrame(jdf)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @family SparkDataFrame functions
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

add `@export`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r115113331
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

sigh, sadly I think you have captured all the constraints we are working 
with here.

let's get the 3 lines in the same order
```
#' Returns a new SparkDataFrame or Column with an alias set. Equivalent to 
SQL "AS" keyword.
#' @param object x a Column or a SparkDataFrame
#' @return a Column or a SparkDataFrame
```

to
```
#' Returns a new SparkDataFrame or Column with an alias set. Equivalent to 
SQL "AS" keyword.
#' @param object x a SparkDataFrame or Column
#' @return a SparkDataFrame or a Column
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-05 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r115085302
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

On the bright side it looks like matching `@rdname` and `@aliases` like:

```r
#' alias
#'
#' @aliases alias,SparkDataFrame-method
#' @family SparkDataFrame functions
#' @rdname alias,SparkDataFrame-method
#' @name alias
...
```
and

```r
#' alias
#'
#' @aliases alias,SparkDataFrame-method
#' @family SparkDataFrame functions
#' @rdname alias,SparkDataFrame-method
#' @name alias
...
```
(I hope this is what you mean) indeed solves SPARK-18825. But it doesn't 
generate any docs for these two and makes CRAN checker unhappy:

```
Undocumented S4 methods:
  generic 'alias' and siglist 'Column'
  generic 'alias' and siglist 'SparkDataFrame'
```
Docs for generic are created but it doesn't help us here. Even if we bring 
`@examples` there we still have to deal with CRAN.

Theres is also my favorite `\name must exist and be unique in Rd files` 
which doesn't gives us much room here, does it?

I opened to suggestions, but personally I am out ideas. I've been digging 
trough `roxygen` docs, but between CRAN,  S4 requirements, `roxygen` limitation 
and our own rules there is not much room left.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114932485
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

that's true actually.
if you think it's useful we could always have them in separate rd.
I'm pretty sure `@rdname` needs to match `@aliases` to fix multiple link 
bug https://issues.apache.org/jira/browse/SPARK-18825; which means we can't 
have multiple functions in the same rd - each has to have its own.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114931344
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

I still believe that AS is applicable to both. Essentially what we do is:

```
SELECT column AS new_column FROM table
```

and

```
(SELECT * FROM table) AS new_table
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114931185
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

To be honest I find both equally confusing, so if you think that a single 
annotation is better, I am happy to oblige.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114929845
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

that we did, at one point. I think the feedback is we could have one line 
for parameter (`object`) and return value could be more but which line matches 
which input parameter type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114929528
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

Wouldn't be better to annotate actual implementations? To get something 
like this:


![image](https://cloud.githubusercontent.com/assets/1554276/25733425/295f465e-3159-11e7-87b7-d959c9bf3352.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114928953
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
+#'
+#' @name alias
+#' @rdname alias
+#' @param object x a Column or a SparkDataFrame
+#' @param data new name to use
--- End diff --

shouldn't we have a `@return` here? perhaps to say
```
Returns a new SparkDataFrame or Column with an alias set.
For Column, equivalent to SQL "AS" keyword.

@return a new SparkDataFrame or Column
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114928655
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

I guess we don't say `return a new Column` but more generally `return a 
Column`
and in other cases we say `return a new SparkDataFrame`

so I guess it's a difference in wording.
I think what you propose is fine, though do you think it's confusing to say 
`Equivalent to SQL "AS" keyword.` because that makes sense only for Column and 
not the whole dataframe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
Github user zero323 closed the pull request at:

https://github.com/apache/spark/pull/17825


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
GitHub user zero323 reopened a pull request:

https://github.com/apache/spark/pull/17825

[SPARK-20550][SPARKR] R wrapper for Dataset.alias

## What changes were proposed in this pull request?

- Add SparkR wrapper for `Dataset.alias`.
- Adjust roxygen annotations for `functions.alias` (including example 
usage).

## How was this patch tested?

Unit tests, `check_cran.sh`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zero323/spark SPARK-20550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17825


commit 944a3ec791a8f103093e24511e895a4ce60970d8
Author: zero323 
Date:   2017-05-01T08:59:24Z

Initial implementation

commit 5e9f8da45c432e0752e5e78556add33e0a6d0557
Author: zero323 
Date:   2017-05-01T22:27:11Z

Adjust argument annotations

- Remove param annotations from dataframe.alias
- Use generic annotations for column.alias

commit 73133f9442ad8317fb12b600221962bf47d8a95c
Author: zero323 
Date:   2017-05-01T22:31:26Z

Add usage examples to column.alias

commit 848eeefc1f18c6aabaf65e6efed259a2fa5c19c3
Author: zero323 
Date:   2017-05-01T22:34:51Z

Remove return type annotation

commit 05c0781110b42a940e06cc31650449a8715e85c9
Author: zero323 
Date:   2017-05-02T02:00:13Z

Fix typo

commit 22d7cf661bb54a8f7f9c660e1d914802f1eb4153
Author: zero323 
Date:   2017-05-02T04:25:34Z

Move dontruns to their own lines

commit 22e1292557f1a5597cde6337267a099bbcdc07aa
Author: zero323 
Date:   2017-05-02T04:27:11Z

Extend param description

commit 6bb3d914960d1cf63e582a7d732ca80ed321e9c5
Author: zero323 
Date:   2017-05-02T04:33:34Z

Add type annotations to since notes

commit b3c1a416a16a9d32649edda2b66fc9c3476358a5
Author: zero323 
Date:   2017-05-02T04:38:51Z

Attach alias test to select-with-column test case

commit 40fedcb8c41bc84deead205aad81e84c095045b5
Author: zero323 
Date:   2017-05-02T04:44:45Z

Extend description

commit 1e1ad443751fc3dc93487e5385cc934feb93f631
Author: zero323 
Date:   2017-05-03T00:25:15Z

Move alias documentation to generics

commit 2d5ace288f2443327696823c343c095f0d8d64ca
Author: zero323 
Date:   2017-05-04T01:13:45Z

Add family annotation

commit 5fe5495580eb3852ea5092a34dc2334c0e45c9b7
Author: zero323 
Date:   2017-05-04T06:32:54Z

Check that stats::alias is not masked

commit 09f9ccaf5e66a400d26b4ab6d600d951305d5fd3
Author: zero323 
Date:   2017-05-04T07:04:52Z

Fix style

commit f1c74f338b8df865a5e8b9a6e281211aa27af7d3
Author: zero323 
Date:   2017-05-04T10:17:42Z

vim




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114925159
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

How about?

```
#' Return a new Column or a SparkDataFrame with a name set. Equivalent to 
SQL "AS" keyword.
```
Is the `Column` new?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114924076
  
--- Diff: R/pkg/R/generics.R ---
@@ -387,6 +387,16 @@ setGeneric("value", function(bcast) { 
standardGeneric("value") })
 #' @export
 setGeneric("agg", function (x, ...) { standardGeneric("agg") })
 
+#' alias
+#'
+#' Set a new name for a Column or a SparkDataFrame. Equivalent to SQL "AS" 
keyword.
--- End diff --

right - I think again we should emphasize on returning a new SparkDataFrame


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114714846
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3715,3 +3715,25 @@ setMethod("rollup",
 sgd <- callJMethod(x@sdf, "rollup", jcol)
 groupedData(sgd)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

yes! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-03 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114687096
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3715,3 +3715,25 @@ setMethod("rollup",
 sgd <- callJMethod(x@sdf, "rollup", jcol)
 groupedData(sgd)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

I general it would nice to sweep all the files to make it more consistent. 
Capitalization, punctuation, examples. return and such.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-03 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114588642
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3715,3 +3715,25 @@ setMethod("rollup",
 sgd <- callJMethod(x@sdf, "rollup", jcol)
 groupedData(sgd)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @rdname alias
+#' @name alias
+#' @examples
--- End diff --

add `@family SparkDataFrame functions`
I think we should probably review all these `@family` at one point...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-02 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114457011
  
--- Diff: R/pkg/R/column.R ---
@@ -132,17 +132,24 @@ createMethods()
 
 #' alias
 #'
-#' Set a new name for a column
+#' Set a new name for an object. Equivalent to SQL "AS" keyword.
--- End diff --

Moving to `generics.R` sounds good.  "Column or SparkDataFrame" in place of 
"object" as well.

Regarding "AS"... In SQL it can be used with both expressions and tables so 
I deliberately didn't quantify this with `Column`.

I am not sure if we really need to state that it returns a new object. 
Maybe  _Return a new Column or SparkDataFrame with an alias. Equivalent to SQL 
"AS" keyword._? But it doesn't sound great.
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114366868
  
--- Diff: R/pkg/R/column.R ---
@@ -132,17 +132,24 @@ createMethods()
 
 #' alias
 #'
-#' Set a new name for a column
+#' Set a new name for an object. Equivalent to SQL "AS" keyword.
--- End diff --

Also, I think this doc block (description, param list specifically) should 
be move to DataFrame.R or generic.R as mentioned before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114366327
  
--- Diff: R/pkg/R/column.R ---
@@ -132,17 +132,24 @@ createMethods()
 
 #' alias
 #'
-#' Set a new name for a column
+#' Set a new name for an object. Equivalent to SQL "AS" keyword.
--- End diff --

right, this is Scala doc for Column.alias `Gives the column an alias` 
(which is not very concise)
Dataset.alias `Returns a new Dataset with an alias set.`

I think we need to say `Set a new name to return as a new object` or 
similar. Actually I think we should say "Column or SparkDataFrame" in place of 
"object" - what do you think?

I think the `SQL "AS"` part but perhaps it will be more clear if lead with 
"for Column, ..."?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114245870
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3715,3 +3715,24 @@ setMethod("rollup",
 sgd <- callJMethod(x@sdf, "rollup", jcol)
 groupedData(sgd)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @rdname alias
+#' @name alias
+#' @examples \dontrun{
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114245853
  
--- Diff: R/pkg/R/column.R ---
@@ -132,16 +132,23 @@ createMethods()
 
 #' alias
 #'
-#' Set a new name for a column
+#' Set a new name for an object
 #'
-#' @param object Column to rename
+#' @param object object to rename
 #' @param data new name to use
 #'
 #' @rdname alias
 #' @name alias
 #' @aliases alias,Column-method
 #' @family colum_func
 #' @export
+#' @examples \dontrun{
--- End diff --

think generally we put \dontrun on the next line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114245818
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2253,6 +2253,15 @@ test_that("mutate(), transform(), rename() and 
names()", {
   detach(airquality)
 })
 
+test_that("alias on SparkDataFrame", {
+  df <- alias(read.df(jsonPath, "json"), "table")
--- End diff --

because trying to make a set of tests that makes sense for CRAN
https://github.com/apache/spark/pull/17817


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114245756
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2253,6 +2253,15 @@ test_that("mutate(), transform(), rename() and 
names()", {
   detach(airquality)
 })
 
+test_that("alias on SparkDataFrame", {
+  df <- alias(read.df(jsonPath, "json"), "table")
--- End diff --

instead of adding a new test, add to one already naming things to reuse an 
existing df?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17825#discussion_r114245780
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3715,3 +3715,24 @@ setMethod("rollup",
 sgd <- callJMethod(x@sdf, "rollup", jcol)
 groupedData(sgd)
   })
+
+#' alias
+#'
+#' @aliases alias,SparkDataFrame-method
+#' @rdname alias
+#' @name alias
+#' @examples \dontrun{
+#' df <- alias(createDataFrame(mtcars), "mtcars")
+#' avg_mpg <- alias(agg(groupBy(df, df$cyl), avg(df$mpg)), "avg_mpg")
+#'
+#' head(select(df, column("mtcars.mpg")))
+#' head(join(df, avg_mpg, column("mtcars.cyl") == column("avg_mpg.cyl")))
+#' }
+#' @note alias since 2.3.0
--- End diff --

then we put type in the note for each overload

https://github.com/apache/spark/blob/master/R/pkg/R/mllib_classification.R#L121


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17825: [SPARK-20550][SPARKR] R wrapper for Dataset.alias

2017-05-01 Thread zero323
GitHub user zero323 opened a pull request:

https://github.com/apache/spark/pull/17825

[SPARK-20550][SPARKR] R wrapper for Dataset.alias

## What changes were proposed in this pull request?

Add SparkR wrapper for `Dataset.alias`.

## How was this patch tested?

Unit tests, `check_cran.sh`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zero323/spark SPARK-20550

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17825


commit 87560ddf680b3d197cc80806bca7f8cadfe277c3
Author: zero323 
Date:   2017-05-01T08:59:24Z

Initial implementation

commit 8e3d3be3715c4e79f20cfe30da10a428a4cde600
Author: zero323 
Date:   2017-05-01T22:27:11Z

Adjust argument annotations

- Remove param annotations from dataframe.alias
- Use generic annotations for column.alias

commit e281ec4cfe0724f079ad711be46b82e06bea20de
Author: zero323 
Date:   2017-05-01T22:31:26Z

Add usage examples to column.alias

commit b7d079b3601cabb86c18108e0eff6e5692a3640c
Author: zero323 
Date:   2017-05-01T22:34:51Z

Remove return type annotation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org