[PR] Split clickbench query set into one file per query [datafusion]

via GitHub Fri, 20 Jun 2025 11:21:20 -0700


pepijnve opened a new pull request, #16476:
URL: https://github.com/apache/datafusion/pull/16476


   ## Which issue does this PR close?
   
   None
   
   ## Rationale for this change
   
   Clickbench query IDs are zero-based while most editors are one-based wrt 
line numbers. This causes a little bit of friction every time you want to check 
the query for a particular clickbench run.
   
   There's precedent in the benchmark suite already for having one file per 
query rather than one file with a query per line. By having distinct files, the 
query id can be reflected in the filename making lookup trivial.
   
   ## What changes are included in this PR?
   
   - Add a script to download the upstream queries.sql file from the clickbench 
repo and split it into one file per query
   - Adapt the clickbench benchmark code to read queries from individual files
   - Adjust parameters in bench.sh
   - Adapt the sql_planner benchmark code to read queries from individual files 
   
   ## Are these changes tested?
   
   Manually tested
   
   ## Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Split clickbench query set into one file per query [datafusion]

Reply via email to