[ 
https://issues.apache.org/jira/browse/SPARK-31916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-31916:
-------------------------------------
    Fix Version/s: 3.1.0

> StringConcat can overflow `length`, leads to StringIndexOutOfBoundsException
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-31916
>                 URL: https://issues.apache.org/jira/browse/SPARK-31916
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Jeffrey Stokes
>            Assignee: Dilip Biswal
>            Priority: Major
>             Fix For: 3.0.1, 3.1.0
>
>
> We have query plans that through multiple transformations can grow extremely 
> long in length. These would eventually throw OutOfMemory exceptions 
> (https://issues.apache.org/jira/browse/SPARK-26103 &; related 
> https://issues.apache.org/jira/browse/SPARK-25380).
>  
> We backported the changes from [https://github.com/apache/spark/pull/23169] 
> into our distribution of Spark, based on 2.4.4, and attempted to use the 
> added `spark.sql.maxPlanStringLength`. While this works in some cases, large 
> query plans can still lead to issues stemming from `StringConcat` in 
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala.
>  
> The following unit test exhibits the issue, which continues to fail in the 
> master branch of spark:
>  
> {code:scala}
>   test("StringConcat doesn't overflow on many inputs") {    
>     val concat = new StringConcat(maxLength = 100)
>     0.to(Integer.MAX_VALUE).foreach { _ =>      
>       concat.append("hello world")    
>      }    
>     assert(concat.toString.length === 100)  
> } 
> {code}
>  
> Looking at the append method here: 
> [https://github.com/apache/spark/blob/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L118-L128]
>  
> It seems like regardless of whether the string to be append is added fully to 
> the internal buffer, added as a substring to reach `maxLength`, or not added 
> at all the internal `length` field is incremented by the length of `s`. 
> Eventually this will overflow an int and cause L123 to substring with a 
> negative index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to