[
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844858#comment-13844858
]
Remus Rusanu commented on HIVE-5595:
------------------------------------
This implementation is very similar to the vectorized MAP JOIN: it iterates
over the input batch and calls super.processOp row-by-row. This has the
advantage of working identically with the existing row-mode SMB join. the
implementation requires only the big table to be vectorized, the small table(s)
are not required to expose the vectorized interface. The way SMB join works is
that it drives the processing on the small tables itself, from the processOp of
the big table, and the way it drives it is entirely row-mode. Unfortunately,
even if the small tables do expose vectorized execution, it is not used. That
portion of the plan (FetchOperator->DummySinkOperator) is completely ignored
during the vectorization. Going forward it would be desirable to provide a more
complete vectorized execution plan for SMB plans, given that the 'small'
table(s) may be (often are) small only in name (ie. not the 'BigTableAlias' in
the SMBJoinDesc).
the implementation of VSMB and VMAPJOIN have a lot in common and much of the
code repeats. I would like to refactor the code to be more DRY, but I would do
that as a separate JIRA/patch avoid impact on the existing VMAPJOIN now.
> Implement vectorized SMB JOIN
> -----------------------------
>
> Key: HIVE-5595
> URL: https://issues.apache.org/jira/browse/HIVE-5595
> Project: Hive
> Issue Type: Sub-task
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Priority: Critical
> Attachments: HIVE-5595.1.patch
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)