Norio Akagi created SPARK-57128:
-----------------------------------
Summary: SQLQueryTestHelper --SET parser cannot handle config
values containing commas
Key: SPARK-57128
URL: https://issues.apache.org/jira/browse/SPARK-57128
Project: Spark
Issue Type: Improvement
Components: SQL, Tests
Affects Versions: 4.1.2, 3.5.8
Reporter: Norio Akagi
Fix For: 5.0.0
{{SQLQueryTestHelper.getSparkSettings}} parses {{--SET}} directives in SQL test
files by splitting the value on commas:
{noformat}
val settingLines = comments.filter(_.startsWith("--SET
")).map(_.substring(6))
settingLines.flatMap(_.split(",").map { kv =>
val (conf, value) = kv.span(_ != '=')
conf.trim -> value.substring(1).trim
}){noformat}
The doc states "you can set multiple configs in one --SET, using comma to
separate them", but this collides with Spark configs whose values themselves
contain commas. For example,
{noformat}
--SET
spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation,org.apache.spark.sql.catalyst.optimizer.ConstantFolding{noformat}
is intended as a single setting with a 2-element list value, but the parser
splits it into two segments, the second of which has no {{{}={}}}.
value.substring(1)
then crashes with {{{}StringIndexOutOfBoundsException{}}}.
A real-world hit: trying to scope-down excluded rules in Apache Gluten's
spark41 SQL tests (Gluten PR apache/incubator-gluten#12165). Workaround there
is restricting to a single excluded rule.
Proposed fix:
Split only on commas that are immediately followed by what looks like a new
{{{}key={}}}:
{noformat}
settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... }){noformat}
Backward compatible — {{--SET k1=v1,k2=v2}} still parses as two settings, and
{{--SET k=v1,v2}} correctly preserves the comma in the value.
Adds a focused SQLQueryTestHelperSuite covering single setting, multi-setting
in one --SET, multi --SET, comma in value, mixed, and ignoring non-SET
comments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]