[
https://issues.apache.org/jira/browse/SPARK-57128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57128:
-----------------------------------
Labels: pull-request-available (was: )
> SQLQueryTestHelper --SET parser cannot handle config values containing commas
> -----------------------------------------------------------------------------
>
> Key: SPARK-57128
> URL: https://issues.apache.org/jira/browse/SPARK-57128
> Project: Spark
> Issue Type: Improvement
> Components: SQL, Tests
> Affects Versions: 3.5.8, 4.1.2
> Reporter: Norio Akagi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 5.0.0
>
>
> {{SQLQueryTestHelper.getSparkSettings}} parses {{--SET}} directives in SQL
> test
> files by splitting the value on commas:
> {noformat}
> val settingLines = comments.filter(_.startsWith("--SET
> ")).map(_.substring(6))
> settingLines.flatMap(_.split(",").map { kv =>
> val (conf, value) = kv.span(_ != '=')
> conf.trim -> value.substring(1).trim
> }){noformat}
> The doc states "you can set multiple configs in one --SET, using comma to
> separate them", but this collides with Spark configs whose values themselves
> contain commas. For example,
> {noformat}
> --SET
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation,org.apache.spark.sql.catalyst.optimizer.ConstantFolding{noformat}
> is intended as a single setting with a 2-element list value, but the parser
> splits it into two segments, the second of which has no {{{}={}}}.
> value.substring(1)
> then crashes with {{{}StringIndexOutOfBoundsException{}}}.
> A real-world hit: trying to scope-down excluded rules in Apache Gluten's
> spark41 SQL tests (Gluten PR apache/incubator-gluten#12165). Workaround
> there
> is restricting to a single excluded rule.
> Proposed fix:
> Split only on commas that are immediately followed by what looks like a new
> {{{}key={}}}:
> {noformat}
> settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... }){noformat}
> Backward compatible — {{--SET k1=v1,k2=v2}} still parses as two settings,
> and
> {{--SET k=v1,v2}} correctly preserves the comma in the value.
> Adds a focused SQLQueryTestHelperSuite covering single setting,
> multi-setting
> in one --SET, multi --SET, comma in value, mixed, and ignoring non-SET
> comments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]