Norio Akagi created SPARK-57128:
-----------------------------------

             Summary: SQLQueryTestHelper --SET parser cannot handle config 
values containing commas
                 Key: SPARK-57128
                 URL: https://issues.apache.org/jira/browse/SPARK-57128
             Project: Spark
          Issue Type: Improvement
          Components: SQL, Tests
    Affects Versions: 4.1.2, 3.5.8
            Reporter: Norio Akagi
             Fix For: 5.0.0


{{SQLQueryTestHelper.getSparkSettings}} parses {{--SET}} directives in SQL test
  files by splitting the value on commas:
{noformat}
    val settingLines = comments.filter(_.startsWith("--SET 
")).map(_.substring(6))
    settingLines.flatMap(_.split(",").map { kv =>
      val (conf, value) = kv.span(_ != '=')
      conf.trim -> value.substring(1).trim
    }){noformat}
  The doc states "you can set multiple configs in one --SET, using comma to
  separate them", but this collides with Spark configs whose values themselves
  contain commas. For example,
{noformat}
--SET 
spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation,org.apache.spark.sql.catalyst.optimizer.ConstantFolding{noformat}
  is intended as a single setting with a 2-element list value, but the parser
  splits it into two segments, the second of which has no {{{}={}}}. 
value.substring(1)
  then crashes with {{{}StringIndexOutOfBoundsException{}}}.

  A real-world hit: trying to scope-down excluded rules in Apache Gluten's
  spark41 SQL tests (Gluten PR apache/incubator-gluten#12165). Workaround there
  is restricting to a single excluded rule.

  Proposed fix:

  Split only on commas that are immediately followed by what looks like a new
  {{{}key={}}}:
{noformat}
    settingLines.flatMap(_.split(",(?=[\\w.]+=)").map { ... }){noformat}
  Backward compatible — {{--SET k1=v1,k2=v2}} still parses as two settings, and
  {{--SET k=v1,v2}} correctly preserves the comma in the value.

  Adds a focused SQLQueryTestHelperSuite covering single setting, multi-setting
  in one --SET, multi --SET, comma in value, mixed, and ignoring non-SET 
comments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to