waitingkuo commented on issue #5276: URL: https://github.com/apache/arrow-datafusion/issues/5276#issuecomment-1432070491
@comphead thank you the quickest way is pull clickbench github repo and go tho this folder https://github.com/ClickHouse/ClickBench/tree/main/datafusion and then ```bash # Download benchmark target data wget --continue https://datasets.clickhouse.com/hits_compatible/hits.parquet # launch datafusion-cli datafusion-cli DataFusion CLI v18.0.0 > CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'hits.parquet'; > SELECT "RegionID", SUM("AdvEngineID"), COUNT(*) AS c, AVG("ResolutionWidth"), COUNT(DISTINCT "UserID") FROM hits GROUP BY "RegionID" ORDER BY c DESC LIMIT 10; ``` to run the full benchmark, you could do ```bash benchmark.sh``` which contains 1. install latest datafusion 2. download benchmark dataset 3. execute 3 times per query it take around 20-30 minutes for me for a full benchmark generation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
