When you implement an equi-join as a nested loops join, the right-hand side always has a filter combining the variable set by the left-hand side and the join column on the right-hand side.
You will need to do a full scan of the right-hand table every time, unless your table is organized so that you can access a subset that matches the filter. In your example select emp.name from emp join dept on emp.deptid = dept.id where dept.name = ‘Sales' The simple plan would be Correlate (v := deptno) Filter name = ’Sales' Scan dept Filter deptno = $v Scan emp If “emp” was physically ordered by deptno, the plan could be optimized to Correlate (v := deptno) Filter name = ’Sales' Scan dept RangeScan emp deptno = $v > On Aug 7, 2019, at 11:41 PM, Gabriel Reid <gabriel.r...@gmail.com> wrote: > > Hi Stamatis, > > Thank you so much, this is exactly the kind of info I was looking for! I > think I can figure out how to go forward based on this, thanks again. > > - Gabriel > > On Thu, Aug 8, 2019 at 8:19 AM Stamatis Zampetakis <zabe...@gmail.com> > wrote: > >> Hi Gabriel, >> >> What you want indeed seems to be a nested loop join; there has been a >> relevant discussion in the dev list [1]. >> You may also find relevant the ongoing discussion on CALCITE-2979 [2]. >> >> Best, >> Stamatis >> >> [1] >> >> https://lists.apache.org/thread.html/d9f95683e66009872a53e7e617295158b98746b550d2bf68230b3096@%3Cdev.calcite.apache.org%3E >> >> [2] https://issues.apache.org/jira/browse/CALCITE-2979 >> >> On Wed, Aug 7, 2019 at 2:56 PM Gabriel Reid <gabriel.r...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I'm currently working on a custom Calcite adapter, and I've got a >> situation >>> where I want to join a small table with a large table, basically just >> using >>> the small table to filter the large table. Conceptually, I'm talking >> about >>> something that is equivalent to the following query: >>> >>> select emp.name >>> from emp join dept on emp.deptid = dept.id >>> where dept.name = 'Sales' >>> >>> I've got converter rules which to push down filtering on all tables, >> which >>> make a big difference in performance. The above query current results in >> an >>> EnumerableHashJoin over a filtered scan over the 'dept' table, and an >>> unfiltered scan over the 'emp' table. >>> >>> What I would like to accomplish is that this is converted into a (I >> think) >>> a nested loop join between 'dept' and emp, so that the filtered scan is >>> done once over 'dept', and then a filtered scan is done for each entry of >>> the 'dept' table using the 'id' value from that entry as a filter value >> on >>> the scan on the 'emp' table. >>> >>> I've been able to get a nested loop join to be used, but I haven't >> managed >>> to have the 'id' values from the 'dept' table to be used to filter the >>> 'emp' table. Instead, the full 'emp' table is scanned for each iteration >> of >>> the 'dept' table. >>> >>> And now my questions: are my hopes/expectations here realistic, and/or is >>> there a much better/easier way of accomplishing the same thing? Is there >>> prior art somewhere within the Calcite code base or elsewhere? Or should >>> this just be working by default? >>> >>> I would assume that this isn't that unusual of a situation, which is why >> I >>> was expecting that there would already be something like this somewhere >> (or >>> I'm doing the wrong thing), but I haven't managed to find any clear >>> pointers in any one direction yet. >>> >>> Thanks in advance for any advice! >>> >>> - Gabriel >>> >>