[jira] [Created] (SPARK-49968) The split function produces incorrect results with an empty regex and a limit

Dejiu Lu (Jira) Mon, 14 Oct 2024 21:28:04 -0700

Dejiu Lu created SPARK-49968:
--------------------------------

             Summary: The split function produces incorrect results with an 
empty regex and a limit
                 Key: SPARK-49968
                 URL: https://issues.apache.org/jira/browse/SPARK-49968
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.1
            Reporter: Dejiu Lu



The current behavior of the split function is as follows:
{code:java}
select split('hello', 'h', 1) // result is ["hello"]
select split('hello', '-', 1) // result is ["hello"]
select split('hello', '', 1)  // result is ["h"]

select split('1A2A3A4', 'A', 3) // result is ["1","2","3A4"]
select split('1A2A3A4', '', 3)  // result is ["1","A","2"]{code}
However, according to the function's description, when the limit is greater 
than zero, the last element of the split result should contain the remaining 
part of the input string.
{code:java}
Arguments:
      * str - a string expression to split.
      * regex - a string representing a regular expression. The regex string 
should be a Java regular expression.
      * limit - an integer expression which controls the number of times the 
regex is applied.
          * limit > 0: The resulting array's length will not be more than 
`limit`, and the resulting array's last entry will contain all input beyond the 
last matched regex.
          * limit <= 0: `regex` will be applied as many times as possible, and 
the resulting array can be of any size. {code}
So, the split function produces incorrect results with an empty regex and a 
limit. The correct result should be: 
{code:java}
select split('hello', '', 1)    // result is ["hello"]

select split('1A2A3A4', '', 3)  // result is ["1","A","2A3A4"]{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-49968) The split function produces incorrect results with an empty regex and a limit

Reply via email to