gortiz commented on PR #14524:
URL: https://github.com/apache/pinot/pull/14524#issuecomment-2497666363
This approach is interesting, but I can see why you said it requires a lot
of changes.
IICU here you propose to generate `PinotLogicalJoins` instead of joins,
which means we need to change (and even worse, copy) a lot of code from Calcite.
My suggestion is different. I would keep the code as it is in master right
now. We keep using `PinotJoin` and therefore for example a filter will be
pushed down into the right hand. In parallel, we create our own
`LookupPinotJoin` that extends RelNode and a rule that transforms `LogicalJoin`
into `LookupPinotJoin` in the specific conditions (ie, hint enabled, one of the
sides is a dim table, etc).
This rule should be applied in the latest phases of the rule pipeline and
could also transform a `LogicalJoin` + `LogicalFilter` into a single
`LookupPinotJoin`, which could also keep the optional filter. This node
therefore won't be logical but closer to physical given it would be
semantically more complex.
Finally we could have a `LookupJoinOperator` whose `nextBlock` should be
something like:
```
leftBlock = read block from left
if (!isEos(block))
return ...
end if
resultBlock = new empty block
for each row in leftBlock
row = execute the lookup
if (filter == null or filter.accept(row))
resultBlock += row
end if
end for
return resultBlock
```
By doing that we can keep using all the standard Calcite rules but end up
producing our own nodes at the end of the pipeline, where all standard
relational logic optimizations have been applied. We still may need to modify
some calcite rules (for example, we may don't want to push a group by into a
logical join in order to use lookup join) but that would be something closer to
what we have in https://github.com/apache/pinot/pull/14523, which would be even
simpler if we can contribute to Calcite to make it easier to implement (without
copying code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]