[ 
https://issues.apache.org/jira/browse/CALCITE-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848204#comment-16848204
 ] 

Stamatis Zampetakis commented on CALCITE-2979:
----------------------------------------------

The technique/algorithms that are discussed here can be found with many names 
and variations. Some common ways to refer to these algorithms are [selective 
join pushdown|http://www.vldb.org/pvldb/vol12/p502-lang.pdf], Batched Key 
Access Joins, Bloom Joins, Bind Joins, Magic Sets and they fall into the more 
general problem usually referred as [sideways information 
passing|https://repository.upenn.edu/cgi/viewcontent.cgi?article=1045&context=db_research].
 

The join inputs do not need to be only TableScan operators. The filter that is 
generated in the left/right side can be pushed down (as any other filter) 
passing through other operators (joins, etc.) and eventually be combined with 
the TableScan. Even if it cannot be combined in the TableScan it can still help 
improving the performance of the query. 

> Add a block-based nested loop join algorithm
> --------------------------------------------
>
>                 Key: CALCITE-2979
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2979
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Khawla Mouhoubi
>            Priority: Major
>              Labels: performance
>
> Currently, Calcite provides a tuple-based nested loop join algorithm 
> implemented through EnumerableCorrelate and EnumerableDefaults.correlateJoin. 
> This means that for each tuple of the outer relation we probe (set variables) 
> in the inner relation.
> The goal of this issue is to add new algorithm (or extend the correlateJoin 
> method) which first gathers blocks (batches) of tuples from the outer 
> relation and then probes the inner relation once per block.
> There are cases (eg., indexes) where the inner relation can be accessed by 
> more than one value which can greatly improve the performance in particular 
> when the outer relation is big.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to