[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105533#comment-14105533
]
James Taylor commented on PHOENIX-852:
--------------------------------------
I think we should do some perf testing to figure it out first. I do think the
common case is the one I mentioned, as it's a classic case for a hash join: two
big tables A and B. You have a query that joins them, but filters B
dramatically. Then you join back to A to get some info through a parent or
child foreign key. Perf between a full scan over A versus a skip scan over the
full PK when you're only matching say 1-5% of the table will be dramatic. I
don't think a skip scan that's even over 50% of the table will be much slower
than a full scan over it.
Let's get 3.1/4.1 out - the RC is being built now. Then perhaps [~mujtaba] can
do some perf testing. I'd like to start doing monthly point releases.
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
> Attachments: 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)