[ https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839542#comment-16839542 ]
Stamatis Zampetakis commented on CALCITE-2979: ---------------------------------------------- Thanks for the analysis [~rubenql]! I haven't figured out all the details of what is the best way to do it and I guess there is not only one choice. It would be nice if [~khawlamhb], who is working on it right now, outlines some possible alternatives with advantages/disadvantages. Just a quick thought (that I guess could work) would be to generate a plan like the following: {noformat} Filter(A.id > B.id) Correlate(blockSize=3) Scan(A) Filter(OR(>(cor0_0,B.id), >(cor0_1,B.id), >(cor0_2,B.id)) Scan(B) {noformat} so the implementation of correlate basically does a cartesian product and the filter on top eliminates the tuples that shouldn't be there. > Add a block-based nested loop join algorithm > -------------------------------------------- > > Key: CALCITE-2979 > URL: https://issues.apache.org/jira/browse/CALCITE-2979 > Project: Calcite > Issue Type: Improvement > Components: core > Affects Versions: 1.19.0 > Reporter: Stamatis Zampetakis > Assignee: Khawla Mouhoubi > Priority: Major > Labels: performance > > Currently, Calcite provides a tuple-based nested loop join algorithm > implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. > This means that for each tuple of the outer relation we probe (set variables) > in the inner relation. > The goal of this issue is to add new algorithm (or extend the correlateJoin > method) which first gathers blocks (batches) of tuples from the outer > relation and then probes the inner relation once per block. > There are cases (eg., indexes) where the inner relation can be accessed by > more than one value which can greatly improve the performance in particular > when the outer relation is big. -- This message was sent by Atlassian JIRA (v7.6.3#76005)