Akanksha-kedia opened a new pull request, #18829:
URL: https://github.com/apache/pinot/pull/18829
## Summary
Adds a `joinOptions(join_strategy='broadcast_right')` query hint to the
multi-stage query engine (MSE). When set, the entire right-side table is
broadcast to every join worker while the left side is hash/random-distributed.
This eliminates the right-side network shuffle for star-schema patterns where
the right table is small enough to fit in memory but is **not** pre-replicated
as a dimension table.
```sql
SELECT /*+ joinOptions(join_strategy='broadcast_right') */
o.order_id, p.product_name
FROM orders o
JOIN products p ON o.product_id = p.id
```
### Why this is useful
| Strategy | When to use |
|---|---|
| `lookup` | Right table is a dimension table replicated to all workers |
| `broadcast_right` (new) | Right table is small but **not** replicated —
avoids right-side shuffle |
| default hash | General case |
## Changes
| File | Change |
|---|---|
| `PinotHintOptions` | Add `BROADCAST_RIGHT_JOIN_STRATEGY` constant +
`useBroadcastRightJoinStrategy()` helper |
| `PinotJoinExchangeNodeInsertRule` (V1 planner) | Detect hint → BROADCAST
exchange on right, hash/random on left |
| `TraitAssignment` (V2 planner) | Detect hint → assign
`BROADCAST_DISTRIBUTED` trait to right input |
| `RelToPlanNodeConverter` (V1) | Set `JoinStrategy.BROADCAST_RIGHT` on the
`JoinNode` |
| `PRelToPlanNodeConverter` (V2) | Same |
| `JoinNode` | Add `BROADCAST_RIGHT` to `JoinStrategy` enum |
| `plan.proto` | Add `BROADCAST_RIGHT = 3` to `JoinStrategy` proto enum |
| `PlanNodeSerializer` / `PlanNodeDeserializer` | Round-trip
`BROADCAST_RIGHT` across broker↔server boundary |
| `DefaultJoinOperatorFactory` | Execute `BROADCAST_RIGHT` as
`HashJoinOperator` (distribution handled at plan time) |
| `InStageStatsTreeBuilder` | Replace fragile `assert LOOKUP` with explicit
switch — handles `BROADCAST_RIGHT` and `ASOF` cleanly |
## Why RIGHT/FULL OUTER joins are blocked
`HashJoinOperator` tracks unmatched right rows **locally per worker**.
Broadcasting the right table to N workers means each worker independently emits
null-extended rows for the same unmatched right row → N duplicate rows in the
output. The planner rejects `RIGHT`/`FULL OUTER` with a clear error message at
query compile time.
## Tests
- `testBroadcastRightJoinHintEquiJoin` — equi-join: verifies BROADCAST
distribution on right, HASH on left, `BROADCAST_RIGHT` strategy on the
`JoinNode`
- `testBroadcastRightJoinHintNonEquiJoin` — non-equi-join: verifies
BROADCAST on right, RANDOM on left
`mvn test -pl pinot-query-planner -Dtest=QueryCompilationTest` — **201
tests, 0 failures**
Closes #14518
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]