[
https://issues.apache.org/jira/browse/PHOENIX-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-1556:
----------------------------------
Description:
At compile time, we know how many guideposts (i.e. how many bytes) will be
scanned for the RHS table. We should, by default, base the decision of using
the hash-join verus many-to-many join on this information.
Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables
being joined are already ordered by the join key. In that case, it's better to
always use the sort merge join.
was:At compile time, we know how many guideposts (i.e. how many bytes) will
be scanned for the RHS table. We should, by default, base the decision of using
the hash-join verus many-to-many join on this information.
> Base hash versus sort merge join decision on how much data will be scanned
> --------------------------------------------------------------------------
>
> Key: PHOENIX-1556
> URL: https://issues.apache.org/jira/browse/PHOENIX-1556
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Maryann Xue
> Priority: Major
> Labels: CostBasedOptimization
>
> At compile time, we know how many guideposts (i.e. how many bytes) will be
> scanned for the RHS table. We should, by default, base the decision of using
> the hash-join verus many-to-many join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables
> being joined are already ordered by the join key. In that case, it's better
> to always use the sort merge join.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)