[
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092329#comment-14092329
]
James Taylor commented on PHOENIX-852:
--------------------------------------
bq. I am thinking if it is possible to merge scan ranges so that we can avoid
compiling those static where conditions but only compile those dynamic ones?
Eventually we should support caching query plans and being able to merge them
together, but I don't think this is a very expensive operation. Let's wait on
this and see where the cost is. We could come in at a slightly lower level for
this, bypassing the parser, and just generating an InListExpression directly.
bq. Do we need a size limit for such optimizations?
I don't think this will be necessary, as an IN clause compiles down to a set of
keys. We've tested it with 250K keys and it was very fast.
bq. However the hash cache thing in join queries is inevitable I think, for
there could be 1-to-many mappings between two tables,
The IN construct can handle partial key matches, so I think it'd work fine.
There's a very little bit of work to pass the number of PK slots a key
encompasses over to the skip scan (for example, if the RHS ends up contributing
to the leading 3 of 4 pk columns, but the underlying engine will handle this
fine).
The nice thing about this approach is you'll be leveraging the way we optimize
these IN expressions. The skip scan will just skip from row key to row key and
be *so much faster* than a full table scan. It'll be a huge speedup for a
relatively common case.
> Optimize child/parent foreign key joins
> ---------------------------------------
>
> Key: PHOENIX-852
> URL: https://issues.apache.org/jira/browse/PHOENIX-852
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Maryann Xue
>
> Often times a join will occur from a child to a parent. Our current algorithm
> would do a full scan of one side or the other. We can do much better than
> that if the HashCache contains the PK (or even part of the PK) from the table
> being joined to. In these cases, we should drive the second scan through a
> skip scan on the server side.
--
This message was sent by Atlassian JIRA
(v6.2#6252)