Norio Akagi created SPARK-57189:
-----------------------------------
Summary: handleSqlCommand executes SQL twice and lets blocked
Commands bypass the SDP guard for WITH_RELATIONS
Key: SPARK-57189
URL: https://issues.apache.org/jira/browse/SPARK-57189
Project: Spark
Issue Type: Bug
Components: Connect
Affects Versions: 5.0.0
Reporter: Norio Akagi
For requests originating from Spark Declarative Pipelines (SDP),
SparkConnectPlanner.handleSqlCommand calls
PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
transformRelation(relation). When the relation is a WITH_RELATIONS
matching isValidSQLWithRefs, this transformation chain leads to:
transformRelation -> transformWithRelations -> transformSqlWithRefs
-> executeSQLWithRefs -> executeSQL -> session.sql(...)
For requests originating from Spark Declarative Pipelines (SDP),
SparkConnectPlanner.handleSqlCommand calls
PipelinesHandler.blockUnsupportedSqlCommand with a queryPlan built via
transformRelation(relation). When the relation is a WITH_RELATIONS
matching isValidSQLWithRefs, this transformation chain leads to:
transformRelation -> transformWithRelations -> transformSqlWithRefs
-> executeSQLWithRefs -> executeSQL -> session.sql(...)
executeSQLWithRefs explicitly comments "Eagerly execute commands of the
provided SQL string", and session.sql triggers actual execution of any
Command/DDL/DML in the root SQL. Commands embedded in reference
SubqueryAlias inputs also execute when eagerlyExecuteCommands walks the
resolved plan tree.
This causes two issues:
1. Bypassed guard. blockUnsupportedSqlCommand checks whether queryPlan
is a Command subclass (CreateTableAsSelect, InsertIntoStatement,
etc.). After execution, the resulting plan is wrapped as
CommandResult, which is not in the blocklist. The guard silently
lets through exactly the things it is supposed to block, and the
Commands have already mutated state by the time the guard runs.
2. Double execution. After the guard, handleSqlCommand falls through to
the normal execution path which calls executeSQLWithRefs again. Any
DDL/DML in the request runs twice, causing duplicate side effects.
The guard should match the runtime's execution surface: inspect both
the root SQL and each reference's input, without itself triggering any
execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]