waitingkuo commented on issue #5276:
URL: 
https://github.com/apache/arrow-datafusion/issues/5276#issuecomment-1432070491

   @comphead  thank you
   the quickest way is pull clickbench github repo and go tho this folder
   https://github.com/ClickHouse/ClickBench/tree/main/datafusion
   
   and then
   ```bash
   # Download benchmark target data
   wget --continue https://datasets.clickhouse.com/hits_compatible/hits.parquet
   # launch datafusion-cli
   datafusion-cli
   DataFusion CLI v18.0.0
   > CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'hits.parquet';
   > SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, 
AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" 
ORDER BY c DESC LIMIT 10;
   ```
   
   to run the full benchmark, you could do ```bash benchmark.sh``` which 
contains
   1. install latest datafusion
   2. download benchmark dataset
   3. execute 3 times per query
   it take around 20-30 minutes for me for a full benchmark generation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to