andygrove commented on PR #1906:
URL: 
https://github.com/apache/datafusion-ballista/pull/1906#issuecomment-4822959065

   Following up on my notes above (the perf regression / "parquet scans reading 
too many rows") — root-caused and fixed, and it turned out to be a correctness 
bug, not just performance.
   
   **What was happening:** DataFusion 54's `DataSourceExec` distributes file 
groups across partitions via a shared work-queue that is only divided when 
*all* partitions of one plan instance are polled concurrently. Ballista runs 
one partition per task on its own decoded plan instance, so a task polling a 
single partition in isolation drained the whole queue and **scanned the entire 
table**. That is the "too many rows" I was seeing — every task dispatched after 
the first wave re-read the whole table, which inflated aggregates by N× (e.g. 
q1 `count_order` was exactly 8× too high on a 2-wave run) and burned N× the 
CPU. The benchmark harness only validates results at SF1, so it slipped through 
as a "perf" symptom. Details in #1907.
   
   **Fix:** restrict each task's `DataSourceExec` to the file group for its 
`partition_id` before execution. q1 on `-c8` is correct again (A/F 
`count_order` 14,804,077) and back to ~0.5 s.
   
   While verifying the full suite I hit a second, independent DF54 regression: 
uncorrelated scalar subqueries (q11/q15/q22) failed to deserialize once the 
plan is split into stages, because DF54 plans them as a `ScalarSubqueryExec` 
whose inner `ScalarSubqueryExpr` can't be decoded on its own. Filed as #1909 
and fixed here by decorrelating them to joins 
(`enable_physical_uncorrelated_scalar_subquery = false` in Ballista's 
restricted config).
   
   Both fixes are in this PR with unit tests, and the **full 22-query TPC-H 
SF10 suite now runs end to end with correct results and no regression**. I've 
updated the PR description accordingly.
   
   One related robustness issue I noticed but left out of this PR (it's 
pre-existing and orthogonal): a deterministic task *decode* failure makes the 
scheduler mark the executor dead, which with a single executor hangs the job 
instead of failing the query. Filed separately as #1908.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to