[ 
https://issues.apache.org/jira/browse/SPARK-57128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57128:
-----------------------------------
    Labels: pull-request-available  (was: )

> SQLQueryTestHelper --SET parser cannot handle config values containing commas
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-57128
>                 URL: https://issues.apache.org/jira/browse/SPARK-57128
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL, Tests
>    Affects Versions: 3.5.8, 4.1.2
>            Reporter: Norio Akagi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>
> {{SQLQueryTestHelper.getSparkSettings}} parses {{--SET}} directives in SQL 
> test
>   files by splitting the value on commas:
> {noformat}
>     val settingLines = comments.filter(_.startsWith("--SET 
> ")).map(_.substring(6))
>     settingLines.flatMap(_.split(",").map { kv =>
>       val (conf, value) = kv.span(_ != '=')
>       conf.trim -> value.substring(1).trim
>     }){noformat}
>   The doc states "you can set multiple configs in one --SET, using comma to
>   separate them", but this collides with Spark configs whose values themselves
>   contain commas. For example,
> {noformat}
> --SET 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation,org.apache.spark.sql.catalyst.optimizer.ConstantFolding{noformat}
>   is intended as a single setting with a 2-element list value, but the parser
>   splits it into two segments, the second of which has no {{{}={}}}. 
> value.substring(1)
>   then crashes with {{{}StringIndexOutOfBoundsException{}}}.
>   A real-world hit: trying to scope-down excluded rules in Apache Gluten's
>   spark41 SQL tests (Gluten PR apache/incubator-gluten#12165). Workaround 
> there
>   is restricting to a single excluded rule.
>   Proposed fix:
>   Split only on commas that are immediately followed by what looks like a new
>   {{{}key={}}}:
> {noformat}
>     settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... }){noformat}
>   Backward compatible — {{--SET k1=v1,k2=v2}} still parses as two settings, 
> and
>   {{--SET k=v1,v2}} correctly preserves the comma in the value.
>   Adds a focused SQLQueryTestHelperSuite covering single setting, 
> multi-setting
>   in one --SET, multi --SET, comma in value, mixed, and ignoring non-SET 
> comments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to