Re: [PR] [SPARK-56599][SQL] Add scan narrowing for column-level UPDATEs in DSv2 [spark]

via GitHub Tue, 05 May 2026 17:36:06 -0700


dongjoon-hyun commented on code in PR #55518:
URL: https://github.com/apache/spark/pull/55518#discussion_r3192343349



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java:
##########
@@ -105,4 +105,45 @@ default String description() {
   default NamedReference[] requiredMetadataAttributes() {
     return new NamedReference[0];
   }
+
+
+  /**
+   * Controls whether to send only the required data columns to the connector 
rather than the
+   * full row.
+   * <p>
+   * When true, Spark narrows the data column schema ({@link 
LogicalWriteInfo#schema()}) to only
+   * the columns declared via {@link #requiredDataAttributes()}. Metadata 
columns (from
+   * {@link #requiredMetadataAttributes()}) and row ID columns (from
+   * {@link SupportsDelta#rowId()}) are unaffected and always projected 
separately.
+   * <p>
+   * If {@link #requiredDataAttributes()} returns a non-empty array, the write 
schema is exactly
+   * those columns in declared order. The connector must include all columns 
it wants to receive,
+   * including the columns being updated. If {@link #requiredDataAttributes()} 
returns an empty
+   * array, Spark sends only the non-identity assigned columns (heuristic 
path).
+   *
+   * @since 4.2.0

Review Comment:
   4.2.0 -> 4.3.0



##########
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RowLevelOperation.java:
##########
@@ -105,4 +105,45 @@ default String description() {
   default NamedReference[] requiredMetadataAttributes() {
     return new NamedReference[0];
   }
+
+
+  /**
+   * Controls whether to send only the required data columns to the connector 
rather than the
+   * full row.
+   * <p>
+   * When true, Spark narrows the data column schema ({@link 
LogicalWriteInfo#schema()}) to only
+   * the columns declared via {@link #requiredDataAttributes()}. Metadata 
columns (from
+   * {@link #requiredMetadataAttributes()}) and row ID columns (from
+   * {@link SupportsDelta#rowId()}) are unaffected and always projected 
separately.
+   * <p>
+   * If {@link #requiredDataAttributes()} returns a non-empty array, the write 
schema is exactly
+   * those columns in declared order. The connector must include all columns 
it wants to receive,
+   * including the columns being updated. If {@link #requiredDataAttributes()} 
returns an empty
+   * array, Spark sends only the non-identity assigned columns (heuristic 
path).
+   *
+   * @since 4.2.0
+   */
+  default boolean supportsColumnUpdates() {
+    return false;
+  }
+
+  /**
+   * Returns data column references required to perform this row-level 
operation.
+   * <p>
+   * This method is only consulted by Spark when {@link 
#supportsColumnUpdates()} returns
+   * {@code true}. If {@code supportsColumnUpdates()} returns {@code false}, 
the returned array
+   * is ignored and the full table row is sent (the default behavior).
+   * <p>
+   * When non-empty, the returned columns become the write schema in declared 
order.
+   * The connector must declare all columns it wants to receive, including the 
columns being
+   * updated. Use {@link RowLevelOperationInfo#updatedColumns()} to learn 
which columns are being
+   * assigned, then add any extra columns needed for row lookup or routing 
(e.g., primary key).
+   * <p>
+   * When empty (the default), Spark falls back to sending only the 
non-identity assigned columns.
+   *
+   * @since 4.2.0

Review Comment:
   ditto. 4.3.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56599][SQL] Add scan narrowing for column-level UPDATEs in DSv2 [spark]

Reply via email to