Iskander14yo opened a new issue, #2035: URL: https://github.com/apache/datafusion-comet/issues/2035
Hi! Just made a [PR](https://github.com/ClickHouse/ClickBench/pull/557) to add Comet to [ClickBench](https://benchmark.clickhouse.com/) - one of the popular benchmarks for analytical workloads. I've decided to create an issue similar to #391. You may close it if you find it irrelevant. I'd appreciate feedback on whether my **configuration and setup are correct**. I consider this important because Comet _failed_ on one query and showed a few curious behaviors I'll outline below. Perhaps, these (and other hidden things) could be fixed with proper configuration. My notes: - Predictably, Comet doesn't support some expressions. That's what I got from logs: ``` >>> grep -P "\[COMET:" log.txt | sed -e 's/^[ \t]*//' | sort | uniq -c 78 +- GlobalLimit [COMET: GlobalLimit is not supported] 18 +- HashAggregate [COMET: Unsupported aggregation mode PartialMerge] 123 +- HashAggregate [COMET: distinct aggregates are not supported] 51 +- Project [COMET: Unsupported cast from LongType to TimestampType with timezone Some(...) and evalMode LEGACY] 126 +- SortAggregate [COMET: SortAggregate is not supported] 43 Execute CreateViewCommand [COMET: Execute CreateViewCommand is not supported] 135 TakeOrderedAndProject [COMET: ] ``` `Unsupported cast from LongType to TimestampType...` thing is something similar to #44 but in this case another column is involved (`EventTime` instead of `EventDate`). Check [this issue](https://github.com/ClickHouse/ClickBench/issues/7) also for the additional info. - Spark's local mode was used. I saw that docs suggest using standalone mode for EC2 but I didn't want to waste some extra resources on separate driver. I looked at Spark UI and seems that Comet works fine. - Comet's cold-runs are significantly slower than hot-runs. Even compared to Spark. - As I already mentioned, Comet failed on one query: ```sql SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000; ``` with error ``` QueryPlanSerde: Comet native execution is disabled due to: unsupported Spark partitioning: ArrayBuffer(PageViews#1143L DESC NULLS LAST) Caused by: org.apache.comet.CometNativeException: InternalError: Native cast invoked for unsupported cast from Utf8 to Dictionary(Int32, Utf8). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org