kosiew opened a new pull request, #20016:
URL: https://github.com/apache/datafusion/pull/20016
## Which issue does this PR close?
* Closes #20007.
## Rationale for this change
ClickBench setup requirements for DataFusion were scattered across multiple
places (benchmark code constants, sqllogictest files, and brief README notes).
This made it easy for users to miss critical configuration steps—especially
`binary_as_string` for binary columns and the `EventDate` UInt16 → DATE
transformation—leading to confusing failures or incorrect results.
This PR consolidates the setup knowledge into a single, copy‑pasteable
section in `benchmarks/README.md`, and adds cross-references from the code/test
locations back to that canonical documentation.
## What changes are included in this PR?
* Added a new **“Running ClickBench on DataFusion”** section to
`benchmarks/README.md` that documents:
* Why and when to enable `binary_as_string` when registering the
ClickBench Parquet file.
* Why `EventDate` must be transformed from UInt16 (days since epoch) to a
SQL `DATE`, including a clear explanation of the failure mode when not
transformed.
* A canonical, end-to-end setup example (external table + view + sample
query).
* How to run the benchmark via `./bench.sh`.
* Added a pointer comment in `benchmarks/src/clickbench.rs` (near the
`HITS_VIEW_DDL` / view DDL) directing readers to the README section as the
source of truth.
* Added a pointer comment in
`datafusion/sqllogictest/test_files/clickbench.slt` directing readers to the
README section for full setup details.
## Are these changes tested?
* Yes (documentation-aligned coverage):
* The `clickbench.slt` file continues to create the `hits` view using the
documented `EventDate` casting pattern.
* The ClickBench benchmark runner in `benchmarks/src/clickbench.rs`
continues to apply the same view DDL and can be exercised via:
* `./bench.sh data clickbench`
* `./bench.sh run clickbench`
No new automated tests were added because the change is primarily
documentation plus comments, and behavior is already validated through existing
benchmark and sqllogictest workflows.
## Are there any user-facing changes?
* Yes: improved user-facing documentation.
* `benchmarks/README.md` now includes a consolidated, canonical
ClickBench-on-DataFusion setup guide with rationale and a complete example.
* No API changes.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]