[ https://issues.apache.org/jira/browse/HIVE-15428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Li updated HIVE-15428: -------------------------- Attachment: HIVE-15428.1.patch Patch to add the detection. Basically it's just copied from Tez. And manually verified it solves the issue found in HIVE-15357. [~csun] and [~xuefuz], please have a look. Thanks. I also need some help from Chao for this question. When there're two cyclic DPP operators, we have to remove one of them. When Tez does it, it removes the one with smaller data size: {code} for (Operator<?> o : component) { if (o instanceof AppMasterEventOperator) { if (victim == null || o.getConf().getStatistics().getDataSize() < victim.getConf().getStatistics() .getDataSize()) { victim = (AppMasterEventOperator) o; } } } {code} But I think this is wrong and we should remove the one with bigger data size - because bigger data size means fewer keys are pruned. And we even have SparkRemoveDynamicPruningBySize to remove DPP if the expected data size exceeds some threshold. [~csun], any ideas on this? I'm assuming the output of DPP is the keys that have survived, not filtered. > HoS DPP doesn't remove cyclic dependency > ---------------------------------------- > > Key: HIVE-15428 > URL: https://issues.apache.org/jira/browse/HIVE-15428 > Project: Hive > Issue Type: Bug > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-15428.1.patch > > > More details in HIVE-15357 -- This message was sent by Atlassian JIRA (v6.3.4#6332)