[jira] [Commented] (PHOENIX-1179) Support many-to-many joins

Maryann Xue (JIRA) Thu, 13 Nov 2014 19:42:06 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211790#comment-14211790
 ]


Maryann Xue commented on PHOENIX-1179:
--------------------------------------

Basically, hash-join is found limited for 3 patterns in terms of functionality:
1) Full joins, where both sides are the "OUTER" side
2) "A RIGHT JOIN B LEFT JOIN C"
3) "A RIGHT JOIN (complicated subquery or sub-join)" or "(complicated subquery 
or sub-join) LEFT JOIN A" or "(complicated subquery or sub-join) INNER JOIN A"
complicated subquery means those subqueries that cannot be flattened into a 
simple scan plan and thus cannot be the LHS table (main table), such as a 
group-by, a limit, a sub-join, etc. 

Currently hash-join has the priority. So if no hint is provided, the compiler 
see if the top-level join can be done using hash-join, and recursively go 
through its sub-joins, and whenever we hit the above three patterns, the 
sort-merge-join plans are used. Again take the example (pattern 2) I wrote in 
my first comment:
"A right join B left join C"
First, it does not qualify for a star join (which should not contain any right 
join), so it should be compiled in a sequence like (A right join B) left join 
C, and now we find that the LHS table is not a flat scan, so we have to compile 
the left join as sort-merge join but we can still do hash-join for (A right 
join B).

So far, we don't have a way to hint using certain join algorithms for a 
specific join in a query. But I think it might be worth adding such facilities.

The orderby is pushed down to join tables in a query rewrite way, so order-by 
optimizations can be automatically used and of course will have server-side 
sorting. For sub-queries used as join tables that has order-by itself, the 
original order-by will be overridden by the sort-merge order by. The outmost 
order-by will just remain intact. 

bq. would be interesting to get your feedback on what we need in terms of 
statistics to drive the optimizer decisions
Sure, definitely. Already had some discussions with [~ram_krish].


> Support many-to-many joins
> --------------------------
>
>                 Key: PHOENIX-1179
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1179
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>             Fix For: 3.0.0, 4.0.0, 5.0.0
>
>         Attachments: 1179.patch
>
>
> Enhance our join capabilities to support many-to-many joins where the size of 
> both sides of the join are too big to fit into memory (and thus cannot use 
> our hash join mechanism). One technique would be to order both sides of the 
> join by their join key and merge sort the results on the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1179) Support many-to-many joins

Reply via email to