alamb commented on issue #18909:
URL: https://github.com/apache/datafusion/issues/18909#issuecomment-3572854475

   # Background
   
   The datafusion clickbench scripts build datafusion-cli like this: 
https://github.com/ClickHouse/ClickBench/tree/main/datafusion-partitioned
   
   ```shell
   CARGO_PROFILE_RELEASE_LTO=true RUSTFLAGS="-C codegen-units=1" cargo build 
--release --package datafusion-cli --bin datafusion-cli
   ```
   Then it clears the filesystem cache like this:
   ```shell
       echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
   ```
   Then it runs the queries like this:
   ```shell
       ./datafusion/target/release/datafusion-cli -f create.sql  -f 
/tmp/query.sql
   ```
   
   For example when running q0, the scripts look like this:
   
   `create.sql`:
   
   ```sql
   CREATE EXTERNAL TABLE hits
   STORED AS PARQUET
   LOCATION 'partitioned'
   OPTIONS ('binary_as_string' 'true');
   ```
   
   `/tmp/query.sql`
   
   ```sql
   SELECT COUNT(*) FROM hits;
   ```
   
   Implications:
   1. Each query is run "cold" in the sense that datafusion-cli is started 
fresh and there is no metadata cache from previous runs. See notes below for 
how we could improve this if we wanted
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to