Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20464#discussion_r165271143
  
    --- Diff: R/pkg/R/column.R ---
    @@ -169,7 +169,7 @@ setMethod("alias",
     #' @note substr since 1.4.0
     setMethod("substr", signature(x = "Column"),
               function(x, start, stop) {
    -            jc <- callJMethod(x@jc, "substr", as.integer(start - 1), 
as.integer(stop - start + 1))
    +            jc <- callJMethod(x@jc, "substr", as.integer(start), 
as.integer(stop - start + 1))
    --- End diff --
    
    This API behavior should be considered as wrong and performs 
inconsistently. Because for starting position 1, we get substring from 1st 
element, but for position 2, we still get the substring from 1. So we will get 
the following inconsistent results:
    
    ```R
    > collect(select(df, substr(df$a, 1, 5)))
      substring(a, 0, 5)
    1              abcde
    > collect(select(df, substr(df$a, 2, 5)))
      substring(a, 1, 4)
    1               abcd
    ```
    
    For such change, we might need to add a note in the doc as @HyukjinKwon 
suggested.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to