[GitHub] [arrow-datafusion] alamb opened a new pull request, #7120: Add parquet-filter and sort benchmarks to dfbench

via GitHub Thu, 27 Jul 2023 14:39:25 -0700


alamb opened a new pull request, #7120:
URL: https://github.com/apache/arrow-datafusion/pull/7120


   Draft while I am finishing testing
   
   Note this looks like a large change but it a lot of moving code around 
rather than any logic changes
   
   # Which issue does this PR close?
   
   Part of https://github.com/apache/arrow-datafusion/issues/7052
   
   # Rationale for this change
   
   see https://github.com/apache/arrow-datafusion/issues/7052
   
   TLDR is that making benchmarks easier to run means more people will find 
them and run them :)
   
   # What changes are included in this PR?
   
   1. Combine / consolidate the parquet filter pushdown and sort benchmarks
   
   Like https://github.com/apache/arrow-datafusion/pull/7054, this PR maintains 
the old entrypoint (`parquet`) as well
   
   So these two commands do the same thing (run the filter pushdown benchmark):
   ```
   # New
   cargo run  --bin dfbench -- parquet-filter --iterations=5 --partitions=1 
--scale-factor=0.01 --path=/tmp
   # Old 
   cargo run  --bin parquet filter --iterations=5 --partitions=1 
--scale-factor=0.01 --path=/tmp
   ```
   Likewise for sort benchmark:
   ```shell
   # New
   cargo run  --bin dfbench sort --iterations=5 --partitions=1 
--scale-factor=0.01 --path=/tmp
   # Old
   cargo run  --bin parquet sort --iterations=5 --partitions=1 
--scale-factor=0.01 --path=/tmp
   ```
   
   
   
   # Are these changes tested?
   I tested them manually, both alone and with `bench.sh`
   
   # Are there any user-facing changes?
   No, this is a development tool 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new pull request, #7120: Add parquet-filter and sort benchmarks to dfbench

Reply via email to