baibaichen opened a new issue, #11922:
URL: https://github.com/apache/gluten/issues/11922

   ## Backend
   VL (Velox)
   
   **Gluten version**: main branch
   
   ## Description
   
   Spark 4.1 introduced memory-based shuffle spill thresholds (SPARK-49386, 
JIRA type: Improvement). The new `spillSizeThreshold` parameter enables 
spilling by data size rather than only by row count. Gluten's shuffle 
implementation does not support this threshold.
   
   Spark 4.1 only.
   
   **Parent issue**: #11910 (`[VL] Spark 4.x: Tracking new feature support`)
   
   ### Impact
   
   | Suite | Exclude | spark40 | spark41 |
   |-------|---------|:-------:|:-------:|
   | GlutenDataFrameWindowFunctionsSuite | SPARK-49386 spill | 🟢 | 🔴 |
   | GlutenJoinSuite | SPARK-49386 SortMergeJoin spill | 🟢 | 🔴 |
   
   Note: `GlutenSQLWindowFunctionSuite` has a pre-existing spill issue ("low 
buffer spill threshold") unrelated to SPARK-49386 — out of scope for this issue.
   
   ### References
   
   - Apache Spark JIRA: 
[SPARK-49386](https://issues.apache.org/jira/browse/SPARK-49386)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to