[ https://issues.apache.org/jira/browse/SPARK-31916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takeshi Yamamuro updated SPARK-31916: ------------------------------------- Fix Version/s: 3.1.0 > StringConcat can overflow `length`, leads to StringIndexOutOfBoundsException > ---------------------------------------------------------------------------- > > Key: SPARK-31916 > URL: https://issues.apache.org/jira/browse/SPARK-31916 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4, 3.0.0 > Reporter: Jeffrey Stokes > Assignee: Dilip Biswal > Priority: Major > Fix For: 3.0.1, 3.1.0 > > > We have query plans that through multiple transformations can grow extremely > long in length. These would eventually throw OutOfMemory exceptions > (https://issues.apache.org/jira/browse/SPARK-26103 & related > https://issues.apache.org/jira/browse/SPARK-25380). > > We backported the changes from [https://github.com/apache/spark/pull/23169] > into our distribution of Spark, based on 2.4.4, and attempted to use the > added `spark.sql.maxPlanStringLength`. While this works in some cases, large > query plans can still lead to issues stemming from `StringConcat` in > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala. > > The following unit test exhibits the issue, which continues to fail in the > master branch of spark: > > {code:scala} > test("StringConcat doesn't overflow on many inputs") { > val concat = new StringConcat(maxLength = 100) > 0.to(Integer.MAX_VALUE).foreach { _ => > concat.append("hello world") > } > assert(concat.toString.length === 100) > } > {code} > > Looking at the append method here: > [https://github.com/apache/spark/blob/fc6af9d900ec6f6a1cbe8f987857a69e6ef600d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala#L118-L128] > > It seems like regardless of whether the string to be append is added fully to > the internal buffer, added as a substring to reach `maxLength`, or not added > at all the internal `length` field is incremented by the length of `s`. > Eventually this will overflow an int and cause L123 to substring with a > negative index. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org