waitingkuo commented on issue #5276: URL: https://github.com/apache/arrow-datafusion/issues/5276#issuecomment-1431892334
@ozankabak thank you, i'll submit another ticket to do so. I just submit the pr to update it to v18 https://github.com/ClickHouse/ClickBench/pull/77 here's the summary: 1 of the query get 20 times boost 1 of the query remain 0.1 times performance 1 of the query still not working except from these 3 outliers, we have 20% improvement in average the histogram chart x axis means: ```old execution time / new execution time``` the larger the more improvement, 1.0 means no change <img width="418" alt="image" src="https://user-images.githubusercontent.com/1100923/219127094-05c45364-15b0-4a5c-8de7-c7cb0133b165.png"> here's the result query by query ```bash Query 0: 0.9776802049030369 Query 1: 1.707491082045184 Query 2: 1.2640586797066016 Query 3: 0.9203807875378623 Query 4: 0.9241755388210855 Query 5: 0.94172723106135 Query 6: 1.649497487437186 Query 7: 1.8705738705738708 Query 8: 0.9532640949554897 Query 9: 0.09891081294396213 Query 10: 0.9790727043117446 Query 11: 1.0668953687821612 Query 12: 1.0256627922162727 Query 13: 0.9520421488719772 Query 14: 1.0535746389404925 Query 15: 1.1839474435875463 Query 16: 1.2163045644807655 Query 17: 1.0196377825847225 Query 18: 1.2491737143968116 Query 19: 0.9741868869385649 Query 20: 1.2916241203256522 Query 21: 0.9167798624671026 Query 22: 0.9743192470077857 Query 23: 1.0809550243337944 Query 24: 1.0715070085911764 Query 25: 1.868698910081744 Query 26: 1.8450380677343134 Query 27: 0.8860306599462013 Query 28: 18.859884645982497 Query 29: 2.9667709147771695 Query 30: 1.0043050430504306 Query 31: 0.9922544851934578 Query 32: doesn't work Query 33: 1.2395144767868875 Query 34: 1.2316816529558063 Query 35: 1.1153739994479712 Query 36: 0.9655502392344499 Query 37: 0.9329896907216495 Query 38: 0.979089790897909 Query 39: 1.0419161676646707 Query 40: 0.9942028985507246 Query 41: 1.028301886792453 Query 42: 1.039426523297491 ``` outlier queries: Query 9: 0.09891081294396213 -> 10 times worse ```sql SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10; ``` i have no clue for now, perhaps due to `DISTINCT` @alamb @tustvold @comphead do you have any idea for this? Query 28: 18.859884645982497 -> almost 20 times better ```sql SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE "Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25; ``` i think this is benifit from the improvement of `regexp_replace` Query 32: query doesn't work ```sql SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10; ``` when i run it, it's killed (same as the previous version) ```bash willy@willybench:~/ClickBench/datafusion$ datafusion-cli -f create.sql q33.sql DataFusion CLI v18.0.0 0 rows in set. Query took 0.039 seconds. Killed ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
