[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112653#comment-14112653
]
Maryann Xue commented on PHOENIX-852:
-------------------------------------
Yes, I will sort the qualified join keys in pk position order and create a new
RowValueConstructor to include them (if more than one join key qualified).
Since we don't have actually ranges until runtime (when the right-hand-side
operand will be evaluated), what I'm trying to do at compile time (with
extractKeyRangeExpressions()) is to make sure there are meaningful join keys
with which we can optimize at runtime and if yes to pick out those one or
*more* join keys and organize them in a way that WhereOptimizer can make use of
to generate ranges at runtime. With only one join key, the problem seems simple
and straightforward. But with multiple join keys, we have to carefully select
those that can help us most.
There are three steps here:
1) to pick out those potentially helpful join key expressions, for which i
assume WhereOptimizer.KeyExpressionVisitor would return not null KeySlots.
2) to make a combination of join key expressions selected in 1) that can
hopefully generate most meaningful key ranges.
3) to verify that the final expression we have constructed will be useful to
WhereOptimizer at runtime.
Step 2 is trickier, and it really depends on what cases we can handle with
RowValueConstructor now. What I'm thinking now is:
a. make use of the KeySlots returned by WhereOptimizer.KeyExpressionVisitor to
get pk position and span
b. order them by pk position
c. try constructing RowValueConstructor with the first N keys and get the
largest N that KeyExpressionVisitor that would return a KeySlots for.
d. the final expression we'll potentially use at runtime is
"RowValueConstructor(sortedJoinKey1, sortedJoinKey2,... sortedJoinKeyN)"
For example, we have join keys pkCol0, pkCol1, pkCol3, and the largest N will
be 2, and the final expression will be (pkCol0, pkCol1).
With a better knowledge of what cases of RowValueConstructor that
WhereOptimizer can handle, I can have a more sophisticated algorithm, like
taking out some duplicated key slots.
Step 3 is like part of the logic from pushKeyExpressionsToScan(), but a little
different. Like I said we don't have actually ranges at compile time, we'll
have to extract the "verification" part of that function.
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
> Attachments: 852-2.patch, 852.patch, PHOENIX-852.patch
>
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)