[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

Maryann Xue (JIRA) Wed, 27 Aug 2014 12:02:46 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112653#comment-14112653
 ]


Maryann Xue commented on PHOENIX-852:
-------------------------------------

Yes, I will sort the qualified join keys in pk position order and create a new 
RowValueConstructor to include them (if more than one join key qualified).
Since we don't have actually ranges until runtime (when the right-hand-side 
operand will be evaluated), what I'm trying to do at compile time (with 
extractKeyRangeExpressions()) is to make sure there are meaningful join keys 
with which we can optimize at runtime and if yes to pick out those one or 
*more* join keys and organize them in a way that WhereOptimizer can make use of 
to generate ranges at runtime. With only one join key, the problem seems simple 
and straightforward. But with multiple join keys, we have to carefully select 
those that can help us most. 
There are three steps here:
1) to pick out those potentially helpful join key expressions, for which i 
assume WhereOptimizer.KeyExpressionVisitor would return not null KeySlots.
2) to make a combination of join key expressions selected in 1) that can 
hopefully generate most meaningful key ranges.
3) to verify that the final expression we have constructed will be useful to 
WhereOptimizer at runtime.

Step 2 is trickier, and it really depends on what cases we can handle with 
RowValueConstructor now. What I'm thinking now is:
a. make use of the KeySlots returned by WhereOptimizer.KeyExpressionVisitor to 
get pk position and span
b. order them by pk position
c. try constructing RowValueConstructor with the first N keys and get the 
largest N that KeyExpressionVisitor that would return a KeySlots for.
d. the final expression we'll potentially use at runtime is 
"RowValueConstructor(sortedJoinKey1, sortedJoinKey2,... sortedJoinKeyN)"
For example, we have join keys pkCol0, pkCol1, pkCol3, and the largest N will 
be 2, and the final expression will be (pkCol0, pkCol1).
With a better knowledge of what cases of RowValueConstructor that 
WhereOptimizer can handle, I can have a more sophisticated algorithm, like 
taking out some duplicated key slots.

Step 3 is like part of the logic from pushKeyExpressionsToScan(), but a little 
different. Like I said we don't have actually ranges at compile time, we'll 
have to extract the "verification" part of that function.

> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>         Attachments: 852-2.patch, 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

Reply via email to