waitingkuo commented on issue #5276:
URL: 
https://github.com/apache/arrow-datafusion/issues/5276#issuecomment-1431892334

   @ozankabak thank you, i'll submit another ticket to do so.
   
   I just submit the pr to update it to v18
   https://github.com/ClickHouse/ClickBench/pull/77
   
   here's the summary:
   
   1 of the query get 20 times boost
   1 of the query remain 0.1 times performance
   1 of the query still not working
   
   except from these 3 outliers,  we have 20% improvement in average
   
   the histogram chart 
   x axis means: ```old execution time / new execution time``` the larger the 
more improvement, 1.0 means no change
   <img width="418" alt="image" 
src="https://user-images.githubusercontent.com/1100923/219127094-05c45364-15b0-4a5c-8de7-c7cb0133b165.png";>
   
   
   
   here's the result query by query
   ```bash
   Query 0: 0.9776802049030369
   Query 1: 1.707491082045184
   Query 2: 1.2640586797066016
   Query 3: 0.9203807875378623
   Query 4: 0.9241755388210855
   Query 5: 0.94172723106135
   Query 6: 1.649497487437186
   Query 7: 1.8705738705738708
   Query 8: 0.9532640949554897
   Query 9: 0.09891081294396213
   Query 10: 0.9790727043117446
   Query 11: 1.0668953687821612
   Query 12: 1.0256627922162727
   Query 13: 0.9520421488719772
   Query 14: 1.0535746389404925
   Query 15: 1.1839474435875463
   Query 16: 1.2163045644807655
   Query 17: 1.0196377825847225
   Query 18: 1.2491737143968116
   Query 19: 0.9741868869385649
   Query 20: 1.2916241203256522
   Query 21: 0.9167798624671026
   Query 22: 0.9743192470077857
   Query 23: 1.0809550243337944
   Query 24: 1.0715070085911764
   Query 25: 1.868698910081744
   Query 26: 1.8450380677343134
   Query 27: 0.8860306599462013
   Query 28: 18.859884645982497
   Query 29: 2.9667709147771695
   Query 30: 1.0043050430504306
   Query 31: 0.9922544851934578
   Query 32: doesn't work
   Query 33: 1.2395144767868875
   Query 34: 1.2316816529558063
   Query 35: 1.1153739994479712
   Query 36: 0.9655502392344499
   Query 37: 0.9329896907216495
   Query 38: 0.979089790897909
   Query 39: 1.0419161676646707
   Query 40: 0.9942028985507246
   Query 41: 1.028301886792453
   Query 42: 1.039426523297491
   ```
   
   
   outlier queries:
   
   Query 9: 0.09891081294396213 -> 10 times worse
   ```sql
   SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, 
AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" 
ORDER BY c DESC LIMIT 10;
   ```
   i have no clue for now, perhaps due to `DISTINCT` 
   @alamb @tustvold @comphead do you have any idea for this?
   
   
   Query 28: 18.859884645982497 -> almost 20 times better
   ```sql
   SELECT REGEXP_REPLACE("Referer", '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS 
k, AVG(length("Referer")) AS l, COUNT(*) AS c, MIN("Referer") FROM hits WHERE 
"Referer" <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
   ```
   i think this is benifit from the improvement of `regexp_replace`
   
   Query 32:  query doesn't work
   ```sql
   SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), 
AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC 
LIMIT 10;
   ```
   when i run it, it's killed (same as the previous version)
   ```bash
   willy@willybench:~/ClickBench/datafusion$ datafusion-cli -f create.sql 
q33.sql 
   DataFusion CLI v18.0.0
   0 rows in set. Query took 0.039 seconds.
   Killed
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to