andygrove commented on issue #622: URL: https://github.com/apache/datafusion-comet/issues/622#issuecomment-2207244670
> Spark produces the worst possible query plan for q72 Yes, it does. I am comparing like-for-like plans between Spark and Comet without any join reordering enabled. > Irrespective of the plan though, given the same number of input rows are the Comet operators also slower than the corresponding Spark operators? In both cases, Spark is executing the SortMergeJoin and the join takes longer when the inputs are from CometScan/CometFilter/Exchange than if they are from the Spark equivalents (with same number of rows in both cases). Things I have learned since filing this issue: - The time reported for the WholestageCodegen C2R is misleading. It is the duration of the operator, not the time spent in the operator. The reason for this taking so long is not necessarily the C2R conversion itself but the elapsed time when retrieving data from child operators (such as the AQEShuffleRead) - With Comet enabled, AQEShuffleRead is coalescing partitions down to a smaller number of partitions than Spark because Comet produces smaller partitions, thanks to columnar compression presumably -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org