[
https://issues.apache.org/jira/browse/HIVE-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao updated HIVE-8233:
-----------------------
Description:
Right now, for multi-table insertion, we will start from multiple
FileSinkOperators, and break from their lowest common ancestor, adding
temporary FileSinkOperator and TableScanOperators. A special case is when the
LCA is a ForwardOperator, in which case we don't break it, since it's already
been optimized.
However, there's a issue, considering the following plan:
{noformat}
...
RS_0
|
FOR
|
/ \
GBY_1 GBY_2
| |
... ...
| |
RS_1 RS_2
| |
... ...
| |
FS_1 FS_2
{noformat}
which may result to:
{noformat}
RW
/ \
RW RW
{noformat}
Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream
branches will get duplicated (and same) input.
was:
Right now, for multi-table insertion, we will start from multiple
FileSinkOperators, and break from their lowest common ancestor, adding
temporary FileSinkOperator and TableScanOperators. A special case is when the
LCA is a ForwardOperator, in which case we don't break it, since it's already
been optimized.
However, there's a issue, considering the following plan:
{noformat}
...
|
FOR
|
RS_0
/ \
RS_1 RS_2
| |
... ...
| |
FS_1 FS_2
{noformat}
In this case, {{FOR}} is the LCA, and the plan will still be a single one.
However, {{RS_0}} leads to both {{RS_1}} and {{RS_2}}. Because of the issue in
HIVE-7731 and HIVE-8118, both downstream branches will get duplicated (and
same) results.
> multi-table insertion doesn't work with ForwardOperator [Spark Branch]
> ----------------------------------------------------------------------
>
> Key: HIVE-8233
> URL: https://issues.apache.org/jira/browse/HIVE-8233
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chao
>
> Right now, for multi-table insertion, we will start from multiple
> FileSinkOperators, and break from their lowest common ancestor, adding
> temporary FileSinkOperator and TableScanOperators. A special case is when the
> LCA is a ForwardOperator, in which case we don't break it, since it's already
> been optimized.
> However, there's a issue, considering the following plan:
> {noformat}
> ...
> RS_0
> |
> FOR
> |
> / \
> GBY_1 GBY_2
> | |
> ... ...
> | |
> RS_1 RS_2
> | |
> ... ...
> | |
> FS_1 FS_2
> {noformat}
> which may result to:
> {noformat}
> RW
> / \
> RW RW
> {noformat}
> Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream
> branches will get duplicated (and same) input.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)