[I] Improve performance of TPC-DS q72 [datafusion-comet]

via GitHub Tue, 02 Jul 2024 17:36:51 -0700


andygrove opened a new issue, #622:
URL: https://github.com/apache/datafusion-comet/issues/622


   ### What is the problem the feature request solves?
   
   I ran our benchmark derived from TPC-DS @ sf=100 locally and saw that q72 
shows the largest regression (measured in seconds rather than percentage) and 
was 754 seconds (12.5 minutes) slower with Comet enabled. Spark took 1.1 hours, 
and Comet took 1.3 hours.
   
   This was based on a single run of all 99 queries in Spark and then again 
with Comet enabled.
   
   Comet does not currently support the many sort-merge joins in the query, so 
Comet is only performing the initial file scans, filters, and exchanges (and 
sometimes sorts) before transitioning back to Spark for the joins.
   
   This issue is for discussing possible solutions to avoid this regression.
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Improve performance of TPC-DS q72 [datafusion-comet]

Reply via email to