Re: [PR] [SPARK-52195][PYTHON][SS] Fix initial state column dropping issue for Python TWS [spark]

via GitHub Fri, 16 May 2025 15:39:11 -0700


jingz-db commented on code in PR #50926:
URL: https://github.com/apache/spark/pull/50926#discussion_r2093745489



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala:
##########
@@ -215,10 +216,15 @@ case class TransformWithStateInPySpark(
     left.output.take(groupingAttributesLen)
   }
 
-  def rightAttributes: Seq[Attribute] = {
+  def rightAttributes(includesInitialStateColumns: Boolean = false): 
Seq[Attribute] = {
     assert(resolved, "This method is expected to be called after resolution.")
     if (hasInitialState) {
-      right.output.take(initGroupingAttrsLen)
+      if (includesInitialStateColumns) {
+        // Include the initial state columns in the references to avoid being 
column pruned.

Review Comment:
   IIUC, the column pruning happens inside optimizer? Do you have a code 
pointer to where in the optimizer that the column get pruned?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52195][PYTHON][SS] Fix initial state column dropping issue for Python TWS [spark]

Reply via email to