Norio Akagi created SPARK-57189:
-----------------------------------

             Summary: handleSqlCommand executes SQL twice and lets blocked 
Commands bypass the SDP guard for WITH_RELATIONS
                 Key: SPARK-57189
                 URL: https://issues.apache.org/jira/browse/SPARK-57189
             Project: Spark
          Issue Type: Bug
          Components: Connect
    Affects Versions: 5.0.0
            Reporter: Norio Akagi


  For requests originating from Spark Declarative Pipelines (SDP),
  SparkConnectPlanner.handleSqlCommand calls
  PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
  transformRelation(relation). When the relation is a WITH_RELATIONS
  matching isValidSQLWithRefs, this transformation chain leads to:

    transformRelation -> transformWithRelations -> transformSqlWithRefs
      -> executeSQLWithRefs -> executeSQL -> session.sql(...)
  For requests originating from Spark Declarative Pipelines (SDP),
  SparkConnectPlanner.handleSqlCommand calls
  PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
  transformRelation(relation). When the relation is a WITH_RELATIONS
  matching isValidSQLWithRefs, this transformation chain leads to:

    transformRelation -> transformWithRelations -> transformSqlWithRefs
      -> executeSQLWithRefs -> executeSQL -> session.sql(...)

  executeSQLWithRefs explicitly comments "Eagerly execute commands of the
  provided SQL string", and session.sql triggers actual execution of any
  Command/DDL/DML in the root SQL. Commands embedded in reference
  SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
  resolved plan tree.

  This causes two issues:

  1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
     is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
     etc.). After execution, the resulting plan is wrapped as
     CommandResult, which is not in the blocklist. The guard silently
     lets through exactly the things it is supposed to block, and the
     Commands have already mutated state by the time the guard runs.

  2. Double execution. After the guard, handleSqlCommand falls through to
     the normal execution path which calls executeSQLWithRefs again. Any
     DDL/DML in the request runs twice, causing duplicate side effects.

  The guard should match the runtime's execution surface: inspect both
  the root SQL and each reference's input, without itself triggering any
  execution.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to