andygrove opened a new issue, #744:
URL: https://github.com/apache/datafusion-comet/issues/744

   ### What is the problem the feature request solves?
   
   The benchmarks in `CometAggregateBenchmark` show that `COUNT` is slower than 
Spark, but `SUM` is faster than Spark. There should not be so much difference 
between these two aggregates. I could not reproduce the performance difference 
in standalone DataFusion.
   
   ### SUM
   
   ```
   Grouped HashAgg Exec: single group key (cardinality 1048576), single 
aggregate SUM:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per 
Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet - Spark (SUM)                                                    
                1672           1698          37          6.3         159.4      
 1.0X
   SQL Parquet - Comet (Scan) (SUM)                                             
                1913           1993         112          5.5         182.5      
 0.9X
   SQL Parquet - Comet (Scan, Exec) (SUM)                                       
                 669            798         113         15.7          63.8      
 2.5X
   ```
   
   ### COUNT
   
   ```
   Grouped HashAgg Exec: single group key (cardinality 1048576), single 
aggregate COUNT:  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per 
Row(ns)   Relative
   
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
   SQL Parquet - Spark (COUNT)                                                  
                  1796           1827          43          5.8         171.3    
   1.0X
   SQL Parquet - Comet (Scan) (COUNT)                                           
                  1810           1853          61          5.8         172.6    
   1.0X
   SQL Parquet - Comet (Scan, Exec) (COUNT)                                     
                  2827           2867          56          3.7         269.6    
   0.6X
   ```
   
   
   ### Describe the potential solution
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to