nathanb9 opened a new pull request, #23214:
URL: https://github.com/apache/datafusion/pull/23214

   ## Which issue does this PR close?
   
   - Closes #23213.
   
   ## Rationale for this change
   
   When a query computes several uncorrelated scalar-aggregate subqueries over 
the same source, DataFusion scans that source once per subquery. When the 
subqueries share an identical source, they can be computed in a single scan by 
pushing each predicate into a `FILTER (WHERE ...)` clause.
   
   ## What changes are included in this PR?
   
   A new logical optimizer rule, `FuseScalarSubqueries`, gated by 
`datafusion.optimizer.enable_fuse_scalar_subqueries` (default off). When a 
projection contains 2 or more uncorrelated scalar-aggregate subqueries over a 
structurally identical source, the rule fuses them into a single aggregate:
   
   ```sql
   -- Before: two scans of t
   SELECT (SELECT count(*) FROM t WHERE a < 10),
          (SELECT avg(x)   FROM t WHERE a >= 10);
   
   -- After: one scan of t
   SELECT count(*) FILTER (WHERE a < 10),
          avg(x)   FILTER (WHERE a >= 10)
   FROM t;
   ```
   
   The source filter becomes the OR of the branch predicates, and each scalar 
subquery is replaced by a reference to the merged aggregate column. The rule 
runs before subquery decorrelation and is conservative: it skips correlated, 
`DISTINCT`, ordered, or volatile aggregates, and predicates containing 
subqueries.
   
   This is an initial, opt-in version mirroring the existing 
`enable_unions_to_filter` rule. Follow-ups could add a size or selectivity 
guard.
   
   ## Are these changes tested?
   
   Yes. Unit snapshot tests cover the fusion and each negative case (different 
sources, single subquery, correlated, `DISTINCT`, volatile, flag off), plus an 
end-to-end sqllogictest in `subquery.slt` verifying both the result and the 
fused plan.
   
   ## Are there any user-facing changes?
   
   One new opt-in config flag, 
`datafusion.optimizer.enable_fuse_scalar_subqueries` (default false), 
documented in `configs.md`. No changes to default behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to