[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105499#comment-14105499
 ] 

James Taylor commented on PHOENIX-852:
--------------------------------------

If it's ok with you, let's hold off on committing this change until we figure 
this out completely. We can always get it in a patch release shortly afterwards.

The perf difference between a scan with fully qualified keys using an IN clause 
versus a range scan is big. The scenario you mentioned is less big - it likely 
depends on how much data (%-wise) is filtered out. [~mujtaba] would be the best 
one to answer that. We should graph that out, though.

bq. So is it good enough that by default we do BETWEEN-AND for full key match 
(e.g. c1,c2,c3 matched in c1,c2,c3), but only IN clause if the 
SKIP_SCAN_HASH_JOIN hint is on.

What about the case where the RHS has been filtered down a lot and you have a 
fully qualified key? Then a full scan over the LHS will be much worse than a 
skip scan driven by the keys formed through the RHS rows. I think this may be 
the most common case.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to