Deciding join order is a basic function of cost-based query optimizers. It
occurs not just with ‘dynamic filters’ or external sources, but with pretty
much every join query. Consider this query:
SELECT *
FROM Emp AS e
JOIN Dept AS d ON e.deptno = d.deptno
WHERE e.job = ‘CLERK’
Emp is a smaller table than Dept, and it is made even smaller when we apply the
filter condition on job. A cost-based optimizer will know the sizes of the
tables, and estimate the selectivity of the ‘job = ‘CLERK’’ condition, and
therefore estimate the size of the filtered Emp table.
Based on those size estimates the optimizer can choose the correct join
algorithm, and also what should be the inputs to the algorithm. In this case, a
hash join would make sense, with Emp on the ‘build’ side (it is read first, and
goes into a hash table) and Dept on the ‘probe’ side (read second, trying to
pair each row with a row in the hash table).
Julian
> On Aug 16, 2022, at 7:53 AM, Jess Balint <[email protected]> wrote:
>
> Hi Yiyun,
>
> The first requirement is that you will need to make some table cardinality
> estimates available to the query planner (eg. via
> RelMetadataQuery.getRowCount()). This will allow you to implement some
> threshold for deciding when to use the described join algorithm. The
> algorithm is known as "bind join". Some details can be found here:
> https://www.vldb.org/conf/1997/P276.PDF
>
> Hope this helps.
> Jess
>
> On Sun, Aug 14, 2022 at 11:00 PM Yiyun Yin <[email protected]> wrote:
>
>> Dear,
>> Hello, I am Yiyun Yin. I am trying to develop a data federation query
>> service using calcite, and I have a question I would like to ask.
>> <https://stackoverflow.com/posts/73352781/timeline>
>>
>> When I use join query with calcite, how can I optimize the query plan
>> so that it can query the small table first, and then use the result of the
>> small table as the filter condition of the large table?
>>
>> I am looking forward for your answers. Thank you.
>>
>>
>> Best Regards,
>> Yours sincerely,
>> Yiyun Yin
>>