[ https://issues.apache.org/jira/browse/HIVE-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16260433#comment-16260433 ]
Rui Li commented on HIVE-18111: ------------------------------- The solution in description still seems incorrect. The problem is each DPP work may have multiple DPP sinks. Therefore, we cannot rely on DPP work ID to tell whether the dir has some outputs for a map work. > Fix temp path for Spark DPP sink > -------------------------------- > > Key: HIVE-18111 > URL: https://issues.apache.org/jira/browse/HIVE-18111 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-18111.1.patch > > > Before HIVE-17877, each DPP sink has only one target work. The output path of > a DPP work is {{TMP_PATH/targetWorkId/dppWorkId}}. When we do the pruning, > each map work reads DPP outputs under {{TMP_PATH/targetWorkId}}. > After HIVE-17877, each DPP sink can have multiple target works. It's possible > that a map work needs to read DPP outputs from multiple > {{TMP_PATH/targetWorkId}}. To solve this, I think we can have a DPP output > path specific to each query, e.g. {{QUERY_TMP_PATH/dpp_output}}. Each DPP > work outputs to {{QUERY_TMP_PATH/dpp_output/dppWorkId}}. And each map work > reads from {{QUERY_TMP_PATH/dpp_output}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)