[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105533#comment-14105533
 ] 

James Taylor commented on PHOENIX-852:
--------------------------------------

I think we should do some perf testing to figure it out first. I do think the 
common case is the one I mentioned, as it's a classic case for a hash join: two 
big tables A and B. You have a query that joins them, but filters B 
dramatically. Then you join back to A to get some info through a parent or 
child foreign key. Perf between a full scan over A versus a skip scan over the 
full PK when you're only matching say 1-5% of the table will be dramatic. I 
don't think a skip scan that's even over 50% of the table will be much slower 
than a full scan over it.

Let's get 3.1/4.1 out - the RC is being built now. Then perhaps [~mujtaba] can 
do some perf testing. I'd like to start doing monthly point releases.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to