[ 
https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-15428:
--------------------------
    Attachment: HIVE-15428.1.patch

Patch to add the detection. Basically it's just copied from Tez. And manually 
verified it solves the issue found in HIVE-15357. [~csun] and [~xuefuz], please 
have a look. Thanks.

I also need some help from Chao for this question. When there're two cyclic DPP 
operators, we have to remove one of them. When Tez does it, it removes the one 
with smaller data size:
{code}
    for (Operator<?> o : component) {
      if (o instanceof AppMasterEventOperator) {
        if (victim == null
            || o.getConf().getStatistics().getDataSize() < 
victim.getConf().getStatistics()
                .getDataSize()) {
          victim = (AppMasterEventOperator) o;
        }
      }
    }
{code}
But I think this is wrong and we should remove the one with bigger data size - 
because bigger data size means fewer keys are pruned. And we even have 
SparkRemoveDynamicPruningBySize to remove DPP if the expected data size exceeds 
some threshold.
[~csun], any ideas on this? I'm assuming the output of DPP is the keys that 
have survived, not filtered.

> HoS DPP doesn't remove cyclic dependency
> ----------------------------------------
>
>                 Key: HIVE-15428
>                 URL: https://issues.apache.org/jira/browse/HIVE-15428
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-15428.1.patch
>
>
> More details in HIVE-15357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to