gortiz commented on PR #14524:
URL: https://github.com/apache/pinot/pull/14524#issuecomment-2497666363

   This approach is interesting, but I can see why you said it requires a lot 
of changes.
   
   IICU here you propose to generate `PinotLogicalJoins` instead of joins, 
which means we need to change (and even worse, copy) a lot of code from Calcite.
   
   My suggestion is different. I would keep the code as it is in master right 
now. We keep using `PinotJoin` and therefore for example a filter will be 
pushed down into the right hand. In parallel, we create our own 
`LookupPinotJoin` that extends RelNode and a rule that transforms `LogicalJoin` 
into `LookupPinotJoin` in the specific conditions (ie, hint enabled, one of the 
sides is a dim table, etc).
   
   This rule should be applied in the latest phases of the rule pipeline and 
could also transform a `LogicalJoin` + `LogicalFilter` into a single 
`LookupPinotJoin`, which could also keep the optional filter. This node 
therefore won't be logical but closer to physical given it would be 
semantically more complex.
   
   Finally we could have a `LookupJoinOperator` whose `nextBlock` should be 
something like:
   ```
   leftBlock = read block from left
   if (!isEos(block))
     return ...
   end if
   resultBlock = new empty block
   for each row in leftBlock
     row = execute the lookup
     if (filter == null or filter.accept(row))
       resultBlock += row
     end if
   end for
   return resultBlock
   ```
   
   By doing that we can keep using all the standard Calcite rules but end up 
producing our own nodes at the end of the pipeline, where all standard 
relational logic optimizations have been applied. We still may need to modify 
some calcite rules (for example, we may don't want to push a group by into a 
logical join in order to use lookup join) but that would be something closer to 
what we have in https://github.com/apache/pinot/pull/14523, which would be even 
simpler if we can contribute to Calcite to make it easier to implement (without 
copying code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to