kosiew opened a new pull request, #22075:
URL: https://github.com/apache/datafusion/pull/22075

   ## Which issue does this PR close?
   
   * Part of #20788
   
   ## Rationale for this change
   
   This PR adds a compact and reproducible test shape for the reported 
high-memory query pattern involving:
   
   * list column expansion via `unnest`
   * row explosion
   * regrouping with `GROUP BY`
   * ordered aggregation using `array_agg(... ORDER BY ...)`
   
   The goal is to isolate and document the execution shape before optimizer or 
executor fixes are introduced. The reproducer is intentionally bounded so it 
can run reliably in local and CI environments while still demonstrating the 
problematic expansion pattern.
   
   ## What changes are included in this PR?
   
   * Added a new benchmark:
   
     * `benchmarks/sql_benchmarks/unnest_array_agg/benchmarks/q01.benchmark`
   * Added SQLLogicTest coverage:
   
     * `datafusion/sqllogictest/test_files/unnest_array_agg_repro.slt`
   * Added a bounded synthetic workload that:
   
     * creates list columns using `range`
     * expands them with `unnest`
     * regroups rows using `array_agg(val ORDER BY idx)`
   * Added validation of the intermediate row expansion count.
   * Captured `EXPLAIN VERBOSE` output for the reproducer, including:
   
     * logical plan
     * initial physical plan
     * physical execution plan
     * schema details for ordered aggregate state
   * Added configurable benchmark scaling via:
   
     * `UNNEST_ARRAY_AGG_ROWS`
     * `UNNEST_ARRAY_AGG_LIST_LEN`
   
   ## Are these changes tested?
   
   Yes.
   
   This PR adds:
   
   * SQLLogicTest coverage in:
   
     * `datafusion/sqllogictest/test_files/unnest_array_agg_repro.slt`
   * A benchmark reproducer in:
   
     * `benchmarks/sql_benchmarks/unnest_array_agg/benchmarks/q01.benchmark`
   
   The SLT verifies:
   
   * row expansion counts
   * ordered `array_agg` results
   * `EXPLAIN VERBOSE` plan shape including `UnnestExec` and `AggregateExec`
   
   ## Are there any user-facing changes?
   
   No user-facing changes. This PR only adds regression coverage and 
benchmarking infrastructure for a specific query shape.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to