[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104447#comment-14104447
]
Maryann Xue commented on PHOENIX-852:
-------------------------------------
bq. I see - so without a filter, there's no need for using the skip scan
because the matching rows on the LHS will be contiguous, right?
Yeah, that's just an assumption from some common cases. But I do think there
are a lot of exceptions, we can't really make any nice guesses without stats.
So I think for now it might just be better that users enable this optimization
explicitly (with hint).
Exceptions to my assumption, for example, can be: 1) even with a filter, the
key values from the RHS table might still be a lot; 2) or sometimes without a
filter, the two tables by themselves vary in size quite significantly and this
optimization should be turned on.
With the two cases you mentioned above, currently no. But that's a good
reminder. How about we do this:
By default, we construct a BETWEEN-AND clause instead of an IN clause, so that
in the "too many key values" situation, this would do no harm but in some other
cases this would help. And only construct IN clauses when the hint is present
or in future when stats are available. What do you think?
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
> Attachments: PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)