[PR] Consolidate TPC benchmark scripts and add TPC-DS Iceberg support [datafusion-comet]

via GitHub Mon, 16 Feb 2026 14:10:37 -0800


andygrove opened a new pull request, #3538:
URL: https://github.com/apache/datafusion-comet/pull/3538


   ## Summary
   
   - Consolidate individual per-engine shell scripts (`spark-tpch.sh`, 
`comet-tpcds.sh`, etc.) into a single Python runner (`benchmarks/tpc/run.py`) 
driven by TOML engine configs in `engines/`
   - Rename `create-iceberg-tpch.py` to `create-iceberg-tables.py` with a 
`--benchmark {tpch,tpcds}` flag to support converting both TPC-H and TPC-DS 
Parquet data to Iceberg tables
   - Add `check_benchmark_env()` in the runner to validate benchmark-specific 
env vars (`TPCH_QUERIES` / `TPCDS_QUERIES`, etc.) and default 
`ICEBERG_DATABASE` to the benchmark name
   - Remove hardcoded TPC-H assumptions from `comet-iceberg.toml` so it works 
for both benchmarks
   
   ## Test plan
   
   - [ ] `python3 run.py --engine comet-iceberg --benchmark tpch --dry-run` 
produces correct command
   - [ ] `python3 run.py --engine comet-iceberg --benchmark tpcds --dry-run` 
produces correct command with `--database tpcds` and TPC-DS executor settings
   - [ ] `python3 create-iceberg-tables.py --help` shows both tpch and tpcds 
choices
   - [ ] Other engines (`spark`, `comet`, `gluten`, `blaze`) still work for 
both benchmarks
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Consolidate TPC benchmark scripts and add TPC-DS Iceberg support [datafusion-comet]

Reply via email to