[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092850#comment-14092850
 ] 

Maryann Xue commented on PHOENIX-852:
-------------------------------------

bq. We could come in at a slightly lower level for this, bypassing the parser, 
and just generating an InListExpression directly.

Sure, we are gonna do that. It's just the re-compilation of other where clauses 
(aside from this IN clause) seems not so neat to me. But I'll just do it this 
way for now, since the overhead is not really gonna be a problem.

There is no doubt that the IN construct can handle the key mapping, but what 
I'm saying is that it is not sufficient in some cases. Suppose we have left 
table tuples (a, 1), (c, 2) and right table tuples (a, 3), (c, 4) and we 
perform a join on the first column but only select those columns from the left 
table. In this case, we can simply use the IN construct and we don't need that 
hash cache. But imagine we have another right table tuple (a, 5), the result 
should now be (a, 1), (a, 1), (c, 2), for there are two tuples matching "a" 
from the right table. In this latter case, we still have to keep the hash cache.

bq. The nice thing about this approach is you'll be leveraging the way we 
optimize these IN expressions. The skip scan will just skip from row key to row 
key and be so much faster than a full table scan. It'll be a huge speedup for a 
relatively common case.

100% agree. PHOENIX-889 is a very good example.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to