[jira] [Updated] (HIVE-8233) multi-table insertion doesn't work with ForwardOperator [Spark Branch]

Chao (JIRA) Tue, 23 Sep 2014 16:35:01 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chao updated HIVE-8233:
-----------------------
    Attachment: HIVE-8233.1-spark.patch

After set {{hive.optimize.multigroupby.common.distincts}} and 
{{hive.multigroupby.singlereducer}}, we can remove ForwardOperator from the 
tree, and hence the issue is gone.

The following tests can be enabled as result:

{noformat}
groupby7_noskew_multi_single_reducer.q
groupby8_map.q
groupby8_map_skew.q
groupby8_noskew.q
groupby8.q
groupby9.q
groupby_multi_insert_common_distinct.q 
union17.q
{noformat}

union10.q cannot be enabled, because the last query in this file requires
{{hive.optimize.multigroupby.common.distincts}} explicitly be set to true, 
otherwise we'll get the following exception:

{noformat}
FAILED: SemanticException [Error 10022]: DISTINCT on different columns not 
supported with skew in data
{noformat}


> multi-table insertion doesn't work with ForwardOperator [Spark Branch]
> ----------------------------------------------------------------------
>
>                 Key: HIVE-8233
>                 URL: https://issues.apache.org/jira/browse/HIVE-8233
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao
>         Attachments: HIVE-8233.1-spark.patch
>
>
> Right now, for multi-table insertion, we will start from multiple 
> FileSinkOperators, and break from their lowest common ancestor, adding 
> temporary FileSinkOperator and TableScanOperators. A special case is when the 
> LCA is a ForwardOperator, in which case we don't break it, since it's already 
> been optimized.
> However, there's a issue, considering the following plan:
> {noformat}
>       ...
>       RS_0
>        |
>       FOR
>        |
>      /   \
>    GBY_1  GBY_2
>     |     |
>    ...   ...
>     |     |
>    RS_1  RS_2
>     |     |
>    ...   ...
>     |     |
>    FS_1  FS_2
> {noformat}
> which may result to:
> {noformat}
>           RW
>          /  \
>        RW    RW
> {noformat}
> Hence, because of the issue in HIVE-7731 and HIVE-8118, both downstream 
> branches will get duplicated (and same) input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8233) multi-table insertion doesn't work with ForwardOperator [Spark Branch]

Reply via email to