[ https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844641#comment-16844641 ]
Khawla Mouhoubi commented on CALCITE-2979: ------------------------------------------ After discussing the matter with [~zabetak] and [~rubenql], a good way to start would be implementing a block based version of the EnumerableDefaults.correlateJoin or EnumerableDefaults.thetaJoin. There needs to be a new operator which implementation will be close to EnumerableCorrelate but with blocks of correlation variables and bloom filters applied to the inner table. A new rule will do the following: {code:java} Join(A.id = B.id) Scan(A) Scan(B){code} Will be turned into: {code:java} NestedLoop(blockSize=3) Scan(A) Filter(OR(=(cor0[0],B.id), =(cor0[1],B.id), =(cor0[2],B.id)) Scan(B) {code} > Add a block-based nested loop join algorithm > -------------------------------------------- > > Key: CALCITE-2979 > URL: https://issues.apache.org/jira/browse/CALCITE-2979 > Project: Calcite > Issue Type: Improvement > Components: core > Affects Versions: 1.19.0 > Reporter: Stamatis Zampetakis > Assignee: Khawla Mouhoubi > Priority: Major > Labels: performance > > Currently, Calcite provides a tuple-based nested loop join algorithm > implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. > This means that for each tuple of the outer relation we probe (set variables) > in the inner relation. > The goal of this issue is to add new algorithm (or extend the correlateJoin > method) which first gathers blocks (batches) of tuples from the outer > relation and then probes the inner relation once per block. > There are cases (eg., indexes) where the inner relation can be accessed by > more than one value which can greatly improve the performance in particular > when the outer relation is big. -- This message was sent by Atlassian JIRA (v7.6.3#76005)