[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398207#comment-15398207
 ] 

Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------

I agree that the join cost is more important. However, the complexity of query 
plan should be affordable. It's clear that the current multi-way fuzzy join is 
not doable in real-time in AsterixDB. We need find a way to resolve this. And 
we need to think about the join cost, too. 

So, as you said, how can we make the prefix-join runnable? Clearly, I think 
multi-way join should be optimized as I mentioned if we have three datasets - 
fuzzy-join on A and B first -> materialize -> then apply fuzzy-join (AB) and C. 
 

> Prefix-based multi-way Fuzzy-join generates an exception.
> ---------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 
> 2), the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is 
> translated into massive number of operators (more than 200 operators in the 
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to