[
https://issues.apache.org/jira/browse/PHOENIX-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-2283:
----------------------------------
Labels: drill (was: )
> Perf test Drill-based broadcast join versus Phoenix hash join
> -------------------------------------------------------------
>
> Key: PHOENIX-2283
> URL: https://issues.apache.org/jira/browse/PHOENIX-2283
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Labels: drill
>
> Most of our parallelized units of work can be modeled as an HBase scan in
> Phoenix (as that's what gets run at the end of the day for the client/server
> RPC). It's annotated with attributes which the coprocessor uses to drive the
> scan.
> Our hash join is different, though, as it makes two scans that are
> coordinated by the client, both parallelized. The first one is the smaller
> side and ends up being cached in the region server. The second one then looks
> up the row in the cache and returns the joined row.
> How would this broadcast hash join be most appropriately modeled in the
> Phoenix+Drill+Calcite world?
> There may not be a big win of using our broadcast join versus Drills' as
> Drill's may be faster given the representation they use.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)