[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

Maryann Xue (JIRA) Wed, 20 Aug 2014 12:57:28 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104447#comment-14104447
 ]


Maryann Xue commented on PHOENIX-852:
-------------------------------------

bq. I see - so without a filter, there's no need for using the skip scan 
because the matching rows on the LHS will be contiguous, right?
Yeah, that's just an assumption from some common cases. But I do think there 
are a lot of exceptions, we can't really make any nice guesses without stats. 
So I think for now it might just be better that users enable this optimization 
explicitly (with hint).
Exceptions to my assumption, for example, can be: 1) even with a filter, the 
key values from the RHS table might still be a lot; 2) or sometimes without a 
filter, the two tables by themselves vary in size quite significantly and this 
optimization should be turned on.

With the two cases you mentioned above, currently no. But that's a good 
reminder. How about we do this: 

By default, we construct a BETWEEN-AND clause instead of an IN clause, so that 
in the "too many key values" situation, this would do no harm but in some other 
cases this would help. And only construct IN clauses when the hint is present 
or in future when stats are available. What do you think?

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

Reply via email to