EmilyMatt opened a new pull request, #1390:
URL: https://github.com/apache/datafusion-comet/pull/1390

   ## What issue does this close?
   
   Closes #1389 .
   
   ## Rationale for this change
   
   As described in the issue, we'd like to prevent situations where despite the 
Partial aggregate being supported and converted, and the shuffle being 
supported and converted, the Final would not be converted, because the result 
expressions were not supported.
   This leads to an unrecoverable state, where Spark expects an aggregate 
buffer to be created by the Partial HA and it doesn't exist.
   
   ## What changes are included in this PR?
   
   I've separated the conversion of the hash aggregate into a separate 
function(I believe everything should be separated tbh, its very hard to manage 
rn), which also returns information about whether the result expressions were 
converted, when they are not, we create a new ProjectExec with those result 
expressions, convert the HA without them, and place a conversion between the 
two, that way we can ensure a valid state at all times.
   This feature can be ignored by enforcing result conversion, using 
"spark.comet.exec.aggregate.enforceResults=true",
   result enforcing is disabled by default.
   
   ## How are these changes tested?
   Essentially a lot of the stability tests, will have a new plan where the 
aggregate is completed natively, and the ProjectExec runs in Spark, instead of 
the current situation, where the final stage of the HashAggregate is done in 
Spark completely.
   Those tests currently fail because I am unable to run them with 
SPARK_GENERATE_GOLDEN_FILES, might  be a skill issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to