gortiz commented on PR #18848: URL: https://github.com/apache/pinot/pull/18848#issuecomment-4799572221
Really nice piece of work — the additive design is clean, the no-false-negative reasoning is carefully laid out, and the correctness coverage (planner/leaf cap sharing a constant, null-key exclusion on both sides, NaN range omission, graceful mixed-version proto fallback, visitor completeness enforced by the non-default interface method) is thorough. A few things I'd like to discuss before merge, all around sizing/footprint and validation rather than correctness: **1. The exact-`IN` tier has no footprint cap.** `maxBytes` (16 MB) only guards the bloom path, and `maxInSize` only chooses bloom-vs-IN for *single-key AUTO*. For `runtime_filter='in'`, and for **any multi-key** join (including AUTO multi-key above threshold), we fall to `attachDynamicFilter` and emit an exact `IN` of up to `maxBuildRows` (≈1M) literals **per key**, AND'd across keys — a multi-MB filter expression embedded in the leaf `PinotQuery` with no abandon guard. Could we have the exact-`IN` path honor the same `maxBytes` abandon as bloom, so there's one consistent footprint ceiling regardless of tier? That keeps the "no new config in v1" stance while closing the gap. **2. Bloom sizing knobs aren't tunable.** `MAX_IN_SIZE`, `MAX_BUILD_ROWS`, `FPP`, and `MAX_BYTES` are fixed constants with no `CONFIG_OF_*` keys — the only runtime-reachable `attachRuntimeFilter` hardcodes the defaults (the parameterized overload is test-only). I understand the intent (they affect selectivity/size, not correctness) and I'm fine deferring real config to a follow-up. Just flagging it so it's a conscious decision, and tying back to (1): `MAX_BYTES` is the one knob that actually bounds footprint, and right now it doesn't apply to the tier that can get largest. **3. No benchmark.** This is fundamentally a perf optimization (the motivation is all scan/serialization/network savings), but there's no JMH or cluster measurement — all tests assert result parity, not speedup. Since there's no selectivity gate yet and the feature can regress when the build side is large or the probe is cheap, a benchmark on the canonical large-fact ⋈ small-dim shape — showing the win at high selectivity *and* the cost at low selectivity / large build — would both justify the feature and tell us how real the hazard in (1) is. `pinot-perf` already has MSE/join scaffolding to build on. Reasonable to defer for an opt-in v1, but worth at least one data point. **4. Minor / please confirm:** NaN handling is correct for the bloom (membership kept, range dropped), but the exact-`IN` tier just emits `IN(probeCol, …, NaN, …)` via the reused `computeInOperands`. Since float/double keys are now first-class here, can you confirm the leaf `IN` predicate and the MSE hash join agree on `NaN = NaN` (i.e. either both match it or both drop it)? If they disagree, exact-`IN` could drop a probe NaN row the join would keep. Likely consistent with the existing SEMI path, but worth a sentence given how carefully NaN is handled elsewhere. None of these block correctness — the join remaining the source of truth makes the feature safe. (1) and (4) are the ones I'd most like resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
