Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20464#discussion_r165271143 --- Diff: R/pkg/R/column.R --- @@ -169,7 +169,7 @@ setMethod("alias", #' @note substr since 1.4.0 setMethod("substr", signature(x = "Column"), function(x, start, stop) { - jc <- callJMethod(x@jc, "substr", as.integer(start - 1), as.integer(stop - start + 1)) + jc <- callJMethod(x@jc, "substr", as.integer(start), as.integer(stop - start + 1)) --- End diff -- This API behavior should be considered as wrong and performs inconsistently. Because for starting position 1, we get substring from 1st element, but for position 2, we still get the substring from 1. So we will get the following inconsistent results: ```R > collect(select(df, substr(df$a, 1, 5))) substring(a, 0, 5) 1 abcde > collect(select(df, substr(df$a, 2, 5))) substring(a, 1, 4) 1 abcd ``` For such change, we might need to add a note in the doc as @HyukjinKwon suggested.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org