andygrove opened a new pull request, #3538:
URL: https://github.com/apache/datafusion-comet/pull/3538
## Summary
- Consolidate individual per-engine shell scripts (`spark-tpch.sh`,
`comet-tpcds.sh`, etc.) into a single Python runner (`benchmarks/tpc/run.py`)
driven by TOML engine configs in `engines/`
- Rename `create-iceberg-tpch.py` to `create-iceberg-tables.py` with a
`--benchmark {tpch,tpcds}` flag to support converting both TPC-H and TPC-DS
Parquet data to Iceberg tables
- Add `check_benchmark_env()` in the runner to validate benchmark-specific
env vars (`TPCH_QUERIES` / `TPCDS_QUERIES`, etc.) and default
`ICEBERG_DATABASE` to the benchmark name
- Remove hardcoded TPC-H assumptions from `comet-iceberg.toml` so it works
for both benchmarks
## Test plan
- [ ] `python3 run.py --engine comet-iceberg --benchmark tpch --dry-run`
produces correct command
- [ ] `python3 run.py --engine comet-iceberg --benchmark tpcds --dry-run`
produces correct command with `--database tpcds` and TPC-DS executor settings
- [ ] `python3 create-iceberg-tables.py --help` shows both tpch and tpcds
choices
- [ ] Other engines (`spark`, `comet`, `gluten`, `blaze`) still work for
both benchmarks
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]