dang-stripe opened a new pull request, #18538:
URL: https://github.com/apache/pinot/pull/18538

   Related to broker pruning: https://github.com/apache/pinot/issues/18201
   
   ## Summary
   
   This short-circuits MSE queries at the broker when all leaf stage segments 
are pruned or the table has no data to avoid server dispatch. Without this, 
when using `useLeafServerForIntermediateStage`, fully-pruned queries fan out to 
[every enabled 
server](https://github.com/apache/pinot/blob/ca0dfc10dae24a5f567209c990ffc94081100630/pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java#L382)
 on the cluster.
   
   ## Approach
   
   When the planner detects every non-replicated leaf stage has zero assigned 
workers, it sets a flag on `DispatchableSubPlan`. The broker checks this flag 
before dispatch and instead rewrites the reduce stage by  inlining leaf stages 
with empty `ValueNode` inputs, then runs the standard reducer locally.
   
   We reuse the existing reducer rather than synthesizing result rows directly 
(what SSE does) because:
   * It handles complex patterns (HAVING, LIMIT/OFFSET, expressions, GROUP BY, 
window functions, JOINs)
   * Aggregate empty-input semantics (COUNT -> 0, SUM -> `null`) are handled by 
the standard LEAF to FINAL aggregation path
   * Future SQL features should automatically work with no additional code.
   
   If any replicated leaf (dim tables) have segments, the short-circuit is 
disabled and the query falls through to normal dispatch.
   
   ## Testing
   
   We've run this on a production cluster and tested with useBrokerPruning both 
off and on to verify the results are identical. The test cases included:
   
   * Short-circuit path: COUNT(*)→0, SUM→null, COALESCE(SUM,0)→0, GROUP BY→0 
rows, multi-table CROSS JOIN→[0, null] (some of this depends on an internal 
variant of https://github.com/apache/pinot/pull/18471 that I'll upstream 
shortly)
   * Dim table guard: date spine (dim table) CROSS JOIN with empty fact table 
correctly does not short-circuit and returns
   date rows with all zeros (same with window functions)
   * No regression: identical results for real data queries (grouped aggs, 
window functions, multi-dim JOINs)
   with pruning on vs. off
   * Null correctness: SQL-standard semantics confirmed for all aggregate 
functions over empty input
   
   cc @Jackie-Jiang @yashmayya @gortiz @timothy-e 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to