Re: [PR] Add probe-side runtime filters for inner joins in the multi-stage engine [pinot]

via GitHub Thu, 25 Jun 2026 06:07:50 -0700


gortiz commented on PR #18848:
URL: https://github.com/apache/pinot/pull/18848#issuecomment-4799572221


   Really nice piece of work — the additive design is clean, the 
no-false-negative reasoning is carefully laid out, and the correctness coverage 
(planner/leaf cap sharing a constant, null-key exclusion on both sides, NaN 
range omission, graceful mixed-version proto fallback, visitor completeness 
enforced by the non-default interface method) is thorough. A few things I'd 
like to discuss before merge, all around sizing/footprint and validation rather 
than correctness:
   
   **1. The exact-`IN` tier has no footprint cap.** `maxBytes` (16 MB) only 
guards the bloom path, and `maxInSize` only chooses bloom-vs-IN for *single-key 
AUTO*. For `runtime_filter='in'`, and for **any multi-key** join (including 
AUTO multi-key above threshold), we fall to `attachDynamicFilter` and emit an 
exact `IN` of up to `maxBuildRows` (≈1M) literals **per key**, AND'd across 
keys — a multi-MB filter expression embedded in the leaf `PinotQuery` with no 
abandon guard. Could we have the exact-`IN` path honor the same `maxBytes` 
abandon as bloom, so there's one consistent footprint ceiling regardless of 
tier? That keeps the "no new config in v1" stance while closing the gap.
   
   **2. Bloom sizing knobs aren't tunable.** `MAX_IN_SIZE`, `MAX_BUILD_ROWS`, 
`FPP`, and `MAX_BYTES` are fixed constants with no `CONFIG_OF_*` keys — the 
only runtime-reachable `attachRuntimeFilter` hardcodes the defaults (the 
parameterized overload is test-only). I understand the intent (they affect 
selectivity/size, not correctness) and I'm fine deferring real config to a 
follow-up. Just flagging it so it's a conscious decision, and tying back to 
(1): `MAX_BYTES` is the one knob that actually bounds footprint, and right now 
it doesn't apply to the tier that can get largest.
   
   **3. No benchmark.** This is fundamentally a perf optimization (the 
motivation is all scan/serialization/network savings), but there's no JMH or 
cluster measurement — all tests assert result parity, not speedup. Since 
there's no selectivity gate yet and the feature can regress when the build side 
is large or the probe is cheap, a benchmark on the canonical large-fact ⋈ 
small-dim shape — showing the win at high selectivity *and* the cost at low 
selectivity / large build — would both justify the feature and tell us how real 
the hazard in (1) is. `pinot-perf` already has MSE/join scaffolding to build 
on. Reasonable to defer for an opt-in v1, but worth at least one data point.
   
   **4. Minor / please confirm:** NaN handling is correct for the bloom 
(membership kept, range dropped), but the exact-`IN` tier just emits 
`IN(probeCol, …, NaN, …)` via the reused `computeInOperands`. Since 
float/double keys are now first-class here, can you confirm the leaf `IN` 
predicate and the MSE hash join agree on `NaN = NaN` (i.e. either both match it 
or both drop it)? If they disagree, exact-`IN` could drop a probe NaN row the 
join would keep. Likely consistent with the existing SEMI path, but worth a 
sentence given how carefully NaN is handled elsewhere.
   
   None of these block correctness — the join remaining the source of truth 
makes the feature safe. (1) and (4) are the ones I'd most like resolved.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add probe-side runtime filters for inner joins in the multi-stage engine [pinot]

Reply via email to