[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

James Taylor (JIRA) Sun, 10 Aug 2014 18:51:00 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092329#comment-14092329
 ]


James Taylor commented on PHOENIX-852:
--------------------------------------

bq. I am thinking if it is possible to merge scan ranges so that we can avoid 
compiling those static where conditions but only compile those dynamic ones?
Eventually we should support caching query plans and being able to merge them 
together, but I don't think this is a very expensive operation. Let's wait on 
this and see where the cost is. We could come in at a slightly lower level for 
this, bypassing the parser, and just generating an InListExpression directly.
bq. Do we need a size limit for such optimizations?
I don't think this will be necessary, as an IN clause compiles down to a set of 
keys. We've tested it with 250K keys and it was very fast.
bq. However the hash cache thing in join queries is inevitable I think, for 
there could be 1-to-many mappings between two tables,
The IN construct can handle partial key matches, so I think it'd work fine. 
There's a very little bit of work to pass the number of PK slots a key 
encompasses over to the skip scan (for example, if the RHS ends up contributing 
to the leading 3 of 4 pk columns, but the underlying engine will handle this 
fine).

The nice thing about this approach is you'll be leveraging the way we optimize 
these IN expressions. The skip scan will just skip from row key to row key and 
be *so much faster* than a full table scan. It'll be a huge speedup for a 
relatively common case.




> Optimize child/parent foreign key joins
> ---------------------------------------
>
>                 Key: PHOENIX-852
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-852
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>
> Often times a join will occur from a child to a parent. Our current algorithm 
> would do a full scan of one side or the other. We can do much better than 
> that if the HashCache contains the PK (or even part of the PK) from the table 
> being joined to. In these cases, we should drive the second scan through a 
> skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

Reply via email to