Re: [I] [EPIC] Benchmark improvements [datafusion]

via GitHub Mon, 30 Mar 2026 04:42:03 -0700


alamb commented on issue #21165:
URL: https://github.com/apache/datafusion/issues/21165#issuecomment-4154389289


   > My proposal would be that (cost permitting) we run benchmarks on every 
merge to main and post a comment in the PR if they regressed (so 1 run per PR 
vs. every commit) or at the very least we run them on RC branches comparing to 
the previous release
   
   This would be great. My biggest concern with this approach is that the 
variability between runs is pretty high so we would have to invest in analysis 
/ time to review the results. Otherwise we'll probably just make a bunch of 
data we never look at
   
   
   
   > Currently writing benchmarks requires using dataframe APIs and quite a bit 
of ceremony (recent example: https://github.com/apache/datafusion/pull/21180).
   > I would like it to be possible to write SQL benchmarks, including with 
some SQL or non-SQL setup (could be a bash script to download data), even if a 
bit of rust is required (e.g. sql_bench!("../q1/")).
   
   Yes, I 100% agree with the value of SQL based benchmarks 
   
   I would personally love to see something with a similar UX to datafusion-cli 
(interactive REPL and the ability to run scripts in files). 
   
   One thought I had would be a special `datafusion-cli` build (maybe we can 
make `datafusion-bench` or something like what @Omega359 is describing, that 
was mostly the same as `datafusion-cli` (it is already a crate) with extra:
   1. commands to create / generate data or setup
   2. Better / easier to script output options
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [EPIC] Benchmark improvements [datafusion]

Reply via email to