Re: [PR] [SPARK-52195][PYTHON][SS] Fix initial state column dropping issue for Python TWS [spark]

via GitHub Fri, 16 May 2025 15:39:55 -0700


jingz-db commented on code in PR #50926:
URL: https://github.com/apache/spark/pull/50926#discussion_r2093745489



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala:
##########
@@ -215,10 +216,15 @@ case class TransformWithStateInPySpark(
     left.output.take(groupingAttributesLen)
   }
 
-  def rightAttributes: Seq[Attribute] = {
+  def rightAttributes(includesInitialStateColumns: Boolean = false): 
Seq[Attribute] = {
     assert(resolved, "This method is expected to be called after resolution.")
     if (hasInitialState) {
-      right.output.take(initGroupingAttrsLen)
+      if (includesInitialStateColumns) {
+        // Include the initial state columns in the references to avoid being 
column pruned.

Review Comment:
   If I understand correctly for your PR descrption, the column pruning happens 
inside optimizer? Do you have a code pointer to where in the optimizer that the 
column get pruned?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52195][PYTHON][SS] Fix initial state column dropping issue for Python TWS [spark]

Reply via email to