[ 
https://issues.apache.org/jira/browse/SPARK-25807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663378#comment-16663378
 ] 

Oron Navon commented on SPARK-25807:
------------------------------------

Thanks guys. [~srowen], fair enough about matching Hive/SQL behavior, but note 
that since users code in Python/Java/Scala (where substr behavior is 
zero-based), this becomes unintuitive and can easily lead to misuse of the API. 
 An explicit {{substr0}} and {{substr1}} would be unambiguous, but I agree it's 
distasteful. Would appreciate if you have any other ideas.

Specifically about allowing {{substr(0, ...)}}, what's the motivation for that? 
With behavior identical to {{substr(1, ...)}}, a user calling {{substr(0, 
...)}} almost certainly indicates the user expects 0-based behavior. Shouldn't 
we throw an exception in this case? It would catch most such situations.

> Mitigate 1-based substr() confusion
> -----------------------------------
>
>                 Key: SPARK-25807
>                 URL: https://issues.apache.org/jira/browse/SPARK-25807
>             Project: Spark
>          Issue Type: Improvement
>          Components: Java API, PySpark
>    Affects Versions: 1.3.0, 2.3.2, 2.4.0, 2.5.0, 3.0.0
>            Reporter: Oron Navon
>            Priority: Minor
>
> The method {{Column.substr()}} is 1-based, conforming with SQL and Hive's 
> {{SUBSTRING}}, and contradicting both Python's {{substr}} and Java's 
> {{substr}}, which are zero-based.  Both PySpark users and Java API users 
> often naturally expect a 0-based {{substr()}}. Adding to the confusion, 
> {{substr()}} currently allows a {{startPos}} value of 0, which returns the 
> same result as {{startPos==1}}.
> Since changing {{substr()}} to 0-based is probably NOT a reasonable option 
> here, I suggest making one or more of the following changes:
>  # Adding a method {{substr0}}, which would be zero-based
>  # Renaming {{substr}} to {{substr1}}
>  # Making the existing {{substr()}} throw an exception on {{startPos==0}}, 
> which should catch and alert most users who expect zero-based behavior.
> This is my first discussion on this project, apologies for any faux pas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to