[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092850#comment-14092850
]
Maryann Xue commented on PHOENIX-852:
-------------------------------------
bq. We could come in at a slightly lower level for this, bypassing the parser,
and just generating an InListExpression directly.
Sure, we are gonna do that. It's just the re-compilation of other where clauses
(aside from this IN clause) seems not so neat to me. But I'll just do it this
way for now, since the overhead is not really gonna be a problem.
There is no doubt that the IN construct can handle the key mapping, but what
I'm saying is that it is not sufficient in some cases. Suppose we have left
table tuples (a, 1), (c, 2) and right table tuples (a, 3), (c, 4) and we
perform a join on the first column but only select those columns from the left
table. In this case, we can simply use the IN construct and we don't need that
hash cache. But imagine we have another right table tuple (a, 5), the result
should now be (a, 1), (a, 1), (c, 2), for there are two tuples matching "a"
from the right table. In this latter case, we still have to keep the hash cache.
bq. The nice thing about this approach is you'll be leveraging the way we
optimize these IN expressions. The skip scan will just skip from row key to row
key and be so much faster than a full table scan. It'll be a huge speedup for a
relatively common case.
100% agree. PHOENIX-889 is a very good example.
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)