[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140651399
  
--- Diff: R/pkg/R/column.R ---
@@ -238,8 +238,10 @@ setMethod("between", signature(x = "Column"),
 #' @param x a Column.
 #' @param dataType a character object describing the target data type.
 #'See
+# nolint start
--- End diff --

I just double checked the links.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-24 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140651387
  
--- Diff: dev/lint-r.R ---
@@ -24,10 +24,16 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
   stop("You should install SparkR in a local directory with 
`R/install-dev.sh`.")
 }
 
-# Installs lintr from Github in a local directory.
-# NOTE: The CRAN's version is too old to adapt to our rules.
-if ("lintr" %in% row.names(installed.packages())  == FALSE) {
-  devtools::install_github("jimhester/lintr@a769c0b")
+# Installs lintr from Github in a local directory if lintr is not 
installed already or
+# the existing lintr is not the specified version.
+#
+# Note that, the CRAN's version is too old to adapt to our rules. 
Therefore, we try to
+# install lintr from the latest commit in Github, rather than a specific 
tag or release.
+# The current latest is jimhester/lintr@5431140 (see SPARK-22063), the dev 
version,
+# '1.0.1.9000'.
+if ("lintr" %in% row.names(installed.packages()) == FALSE ||
+packageVersion("lintr") != "1.0.1.9000") {
--- End diff --

I checked this - when downgrading, upgrading and not installed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140643692
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2650,8 +2650,9 @@ setMethod("merge",
 #' @param suffix a suffix for the column name
 #' @return list of columns
 #'
-#' @note generateAliasesForIntersectedCols since 1.6.0
-generateAliasesForIntersectedCols <- function(x, intersectedColNames, 
suffix) { # nolint
+#' @note genAliasesForIntersectedCols since 1.6.0
--- End diff --

nit: I'd remove this `@note` too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140244036
  
--- Diff: dev/lint-r.R ---
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages())  == FALSE) {
   devtools::install_github("jimhester/lintr@a769c0b")
+  # devtools::install_github("jimhester/lintr@5431140")
--- End diff --

Oh, just to be clear, I didn't change this yet but just commented this for 
now ... Actually, `devtools::install_github("jimhester/lintr@5431140")` is new 
one but I didn't change this yet just in case.

Will add some comments around this when I add few more changes and clean up 
soon in any event.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140242203
  
--- Diff: dev/lint-r.R ---
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages())  == FALSE) {
   devtools::install_github("jimhester/lintr@a769c0b")
+  # devtools::install_github("jimhester/lintr@5431140")
--- End diff --

It looks like the commented is a old code. Maybe a comment to explain it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-21 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140238274
  
--- Diff: dev/lint-r.R ---
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages())  == FALSE) {
   devtools::install_github("jimhester/lintr@a769c0b")
+  # devtools::install_github("jimhester/lintr@5431140")
--- End diff --

Yes .. I think I should add few more changes here if we are going to update 
this package.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140151179
  
--- Diff: dev/lint-r.R ---
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages())  == FALSE) {
   devtools::install_github("jimhester/lintr@a769c0b")
+  # devtools::install_github("jimhester/lintr@5431140")
--- End diff --

because of `if ("lintr" %in% row.names(installed.packages())  == FALSE) {`
?
it might be the case the Jenkins boxes already have a version installed and 
this won't update it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140150499
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2649,15 +2651,15 @@ setMethod("merge",
 #' @return list of columns
 #'
 #' @note generateAliasesForIntersectedCols since 1.6.0
-generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
suffix) {
+generateAliasesForIntersectedCols <- function(x, intersectedColNames, 
suffix) { # nolint
--- End diff --

wait... this isn't public though; it's not in 
https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE
looks like this shouldn't be in the doc (`@noRd` or `#` instead of `#'`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140150229
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2594,12 +2596,12 @@ setMethod("merge",
 } else {
   # if by or both by.x and by.y have length 0, use Cartesian 
Product
   joinRes <- crossJoin(x, y)
-  return (joinRes)
+  return(joinRes)
 }
 
 # sets alias for making colnames unique in dataframes 'x' and 
'y'
-colsX <- generateAliasesForIntersectedCols(x, by, suffixes[1])
-colsY <- generateAliasesForIntersectedCols(y, by, suffixes[2])
+colsX <- generateAliasesForIntersectedCols(x, by, suffixes[1]) 
# nolint
+colsY <- generateAliasesForIntersectedCols(y, by, suffixes[2]) 
# nolint
--- End diff --

what's the problem here? name too long I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r140150600
  
--- Diff: R/pkg/R/context.R ---
@@ -329,7 +329,7 @@ spark.addFile <- function(path, recursive = FALSE) {
 #' spark.getSparkFilesRootDirectory()
 #'}
 #' @note spark.getSparkFilesRootDirectory since 2.1.0
-spark.getSparkFilesRootDirectory <- function() {
+spark.getSparkFilesRootDirectory <- function() { # nolint
--- End diff --

this one is an API...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919715
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -132,10 +132,10 @@ print.summary.decisionTree <- function(x) {
 #' Gradient Boosted Tree model, \code{predict} to make predictions on new 
data, and
 #' \code{write.ml}/\code{read.ml} to save/load fitted models.
 #' For more details, see
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression}{
-#' GBT Regression} and
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier}{
-#' GBT Classification}
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-
+#' tree-regression}{GBT Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-
+#' tree-classifier}{GBT Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919722
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -567,10 +569,10 @@ setMethod("write.ml", signature(object = 
"RandomForestClassificationModel", path
 #' model, \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #' For more details, see
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{
-#' Decision Tree Regression} and
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{
-#' Decision Tree Classification}
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-
+#' regression}{Decision Tree Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-
+#' classifier}{Decision Tree Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919377
  
--- Diff: R/pkg/R/context.R ---
@@ -329,7 +329,7 @@ spark.addFile <- function(path, recursive = FALSE) {
 #' spark.getSparkFilesRootDirectory()
 #'}
 #' @note spark.getSparkFilesRootDirectory since 2.1.0
-spark.getSparkFilesRootDirectory <- function() {
+spark.getSparkFilesRootDirectory <- function() { # nolint
--- End diff --

Ditto: 30 length identifier limit but exposed in the doc


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919190
  
--- Diff: R/pkg/R/column.R ---
@@ -238,8 +238,8 @@ setMethod("between", signature(x = "Column"),
 #' @param x a Column.
 #' @param dataType a character object describing the target data type.
 #'See
-#'
\href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark}{
-#'Spark Data Types} for available data types.
+#'
\href{https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-
+#'r-and-spark}{Spark Data Types} for available data types.
--- End diff --

I checked this link is not broken for sure.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919707
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -352,10 +353,10 @@ setMethod("write.ml", signature(object = 
"GBTClassificationModel", path = "chara
 #' model, \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #' For more details, see
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression}{
-#' Random Forest Regression} and
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier}{
-#' Random Forest Classification}
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-
+#' regression}{Random Forest Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-
+#' classifier}{Random Forest Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919889
  
--- Diff: dev/lint-r.R ---
@@ -28,6 +28,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, 
logical.return = TRUE)) {
 # NOTE: The CRAN's version is too old to adapt to our rules.
 if ("lintr" %in% row.names(installed.packages())  == FALSE) {
   devtools::install_github("jimhester/lintr@a769c0b")
+  # devtools::install_github("jimhester/lintr@5431140")
--- End diff --

Here, I didn't yet change this. Up to my knowledge, this doesn't actually 
try to install this though.. just in case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919103
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2649,15 +2651,15 @@ setMethod("merge",
 #' @return list of columns
 #'
 #' @note generateAliasesForIntersectedCols since 1.6.0
-generateAliasesForIntersectedCols <- function (x, intersectedColNames, 
suffix) {
+generateAliasesForIntersectedCols <- function(x, intersectedColNames, 
suffix) { # nolint
--- End diff --

Looks this hits 30 length identifier limit but we can't change as exposed 
in the doc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139918553
  
--- Diff: R/pkg/.lintr ---
@@ -1,2 +1,2 @@
-linters: with_defaults(line_length_linter(100), multiple_dots_linter = 
NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), 
closed_curly_linter(allow_single_line = TRUE))
+linters: with_defaults(line_length_linter(100), multiple_dots_linter = 
NULL, object_name_linter = NULL, camel_case_linter = NULL, 
open_curly_linter(allow_single_line = TRUE), 
closed_curly_linter(allow_single_line = TRUE))
--- End diff --

`object_name_linter = NULL` looks required. Otherwise, it complains about 
inconsistent naming styles.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/19290

[WIP][SPARK-22063][R] Upgrades lintr to latest commit sha1 ID

## What changes were proposed in this pull request?

Currently, we set lintr to `jimhester/lintr@a769c0b` (see 
[this](https://github.com/apache/spark/commit/7d1175011c976756efcd4e4e4f70a8fd6f287026)
 and [SPARK-14074](https://issues.apache.org/jira/browse/SPARK-14074)).

I first tested and checked lintr-1.0.1 but it looks many important fixes 
are missing (for example, checking 100 length). So, I decided to propose it 
with the sha1 to 
https://github.com/jimhester/lintr/commit/5431140ffea65071f1327625d4a8de9688fa7e72.

It looks it has fixed many bugs and now finds many instances that I have 
observed and thought should be caught time to time, here I filed [the 
results](https://gist.github.com/HyukjinKwon/4f59ddcc7b6487a02da81800baca533c).

The downside looks it now takes about 7ish mins, (it was 2ish mins before).

## How was this patch tested?

Manually, `./dev/lint-r` after manually updating the lintr package.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark upgrade-r-lint

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19290


commit 6e41eff8334e0ce87841941d34fb9b88751f1705
Author: hyukjinkwon 
Date:   2017-09-19T15:51:23Z

Upgrade lintr to latest commit sha1 ID




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org