[
https://issues.apache.org/jira/browse/PHOENIX-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211790#comment-14211790
]
Maryann Xue commented on PHOENIX-1179:
--------------------------------------
Basically, hash-join is found limited for 3 patterns in terms of functionality:
1) Full joins, where both sides are the "OUTER" side
2) "A RIGHT JOIN B LEFT JOIN C"
3) "A RIGHT JOIN (complicated subquery or sub-join)" or "(complicated subquery
or sub-join) LEFT JOIN A" or "(complicated subquery or sub-join) INNER JOIN A"
complicated subquery means those subqueries that cannot be flattened into a
simple scan plan and thus cannot be the LHS table (main table), such as a
group-by, a limit, a sub-join, etc.
Currently hash-join has the priority. So if no hint is provided, the compiler
see if the top-level join can be done using hash-join, and recursively go
through its sub-joins, and whenever we hit the above three patterns, the
sort-merge-join plans are used. Again take the example (pattern 2) I wrote in
my first comment:
"A right join B left join C"
First, it does not qualify for a star join (which should not contain any right
join), so it should be compiled in a sequence like (A right join B) left join
C, and now we find that the LHS table is not a flat scan, so we have to compile
the left join as sort-merge join but we can still do hash-join for (A right
join B).
So far, we don't have a way to hint using certain join algorithms for a
specific join in a query. But I think it might be worth adding such facilities.
The orderby is pushed down to join tables in a query rewrite way, so order-by
optimizations can be automatically used and of course will have server-side
sorting. For sub-queries used as join tables that has order-by itself, the
original order-by will be overridden by the sort-merge order by. The outmost
order-by will just remain intact.
bq. would be interesting to get your feedback on what we need in terms of
statistics to drive the optimizer decisions
Sure, definitely. Already had some discussions with [~ram_krish].
> Support many-to-many joins
> --------------------------
>
> Key: PHOENIX-1179
> URL: https://issues.apache.org/jira/browse/PHOENIX-1179
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Maryann Xue
> Fix For: 3.0.0, 4.0.0, 5.0.0
>
> Attachments: 1179.patch
>
>
> Enhance our join capabilities to support many-to-many joins where the size of
> both sides of the join are too big to fit into memory (and thus cannot use
> our hash join mechanism). One technique would be to order both sides of the
> join by their join key and merge sort the results on the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)