kosiew opened a new pull request, #20016:
URL: https://github.com/apache/datafusion/pull/20016

   
   ## Which issue does this PR close?
   
   * Closes #20007.
   
   ## Rationale for this change
   
   ClickBench setup requirements for DataFusion were scattered across multiple 
places (benchmark code constants, sqllogictest files, and brief README notes). 
This made it easy for users to miss critical configuration steps—especially 
`binary_as_string` for binary columns and the `EventDate` UInt16 → DATE 
transformation—leading to confusing failures or incorrect results.
   
   This PR consolidates the setup knowledge into a single, copy‑pasteable 
section in `benchmarks/README.md`, and adds cross-references from the code/test 
locations back to that canonical documentation.
   
   ## What changes are included in this PR?
   
   * Added a new **“Running ClickBench on DataFusion”** section to 
`benchmarks/README.md` that documents:
   
     * Why and when to enable `binary_as_string` when registering the 
ClickBench Parquet file.
     * Why `EventDate` must be transformed from UInt16 (days since epoch) to a 
SQL `DATE`, including a clear explanation of the failure mode when not 
transformed.
     * A canonical, end-to-end setup example (external table + view + sample 
query).
     * How to run the benchmark via `./bench.sh`.
   * Added a pointer comment in `benchmarks/src/clickbench.rs` (near the 
`HITS_VIEW_DDL` / view DDL) directing readers to the README section as the 
source of truth.
   * Added a pointer comment in 
`datafusion/sqllogictest/test_files/clickbench.slt` directing readers to the 
README section for full setup details.
   
   ## Are these changes tested?
   
   * Yes (documentation-aligned coverage):
   
     * The `clickbench.slt` file continues to create the `hits` view using the 
documented `EventDate` casting pattern.
     * The ClickBench benchmark runner in `benchmarks/src/clickbench.rs` 
continues to apply the same view DDL and can be exercised via:
   
       * `./bench.sh data clickbench`
       * `./bench.sh run clickbench`
   
   No new automated tests were added because the change is primarily 
documentation plus comments, and behavior is already validated through existing 
benchmark and sqllogictest workflows.
   
   ## Are there any user-facing changes?
   
   * Yes: improved user-facing documentation.
   
     * `benchmarks/README.md` now includes a consolidated, canonical 
ClickBench-on-DataFusion setup guide with rationale and a complete example.
     * No API changes.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to