[ 
https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297875#comment-16297875
 ] 

liyunzhang commented on HIVE-18148:
-----------------------------------

sorry for reply late. still have 1 question about the code

{code}

621       /** For DPP sinks w/ common join, we'll split the tree and what's 
above the branching
622        * operator is computed multiple times. Therefore it may not be good 
for performance to support
623        * nested DPP sinks, i.e. one DPP sink depends on other DPP sinks.
624        * The following is an example:
625        *
626        *             TS          TS
627        *             |           |
628        *            ...         FIL
629        *            |           |  \
630        *            RS         RS  SEL
631        *              \        /    |
632        *     TS          JOIN      GBY
633        *     |         /     \      |
634        *    RS        RS    SEL   DPP2
635        *     \       /       |
636        *       JOIN         GBY
637        *                    |
638        *                  DPP1
639        *
640        * where DPP1 depends on DPP2.
641        *
642        * To avoid such case, we'll visit all the branching operators. If a 
branching operator has any
643        * further away DPP branches w/ common join in its sub-tree, such 
branches will be removed.
644        * In the above example, the branch of DPP1 will be removed.
645        */
{code}

this function will  first collect the branching operators(FIL,JOIN in above 
example). then remove the nested DPP in the  branches.  If first traverses FIL, 
then remove DPP1 , If first tranverses JOIN, then remove DPP2.   This function 
will randomly remove one of nested DPPs.  Here I am confused how to judge which 
dpp need to be removed?  If my understanding is not right, tell me. 

> NPE in SparkDynamicPartitionPruningResolver
> -------------------------------------------
>
>                 Key: HIVE-18148
>                 URL: https://issues.apache.org/jira/browse/HIVE-18148
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-18148.1.patch, HIVE-18148.2.patch
>
>
> The stack trace is:
> {noformat}
> 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] 
> ql.Driver: FAILED: NullPointerException null
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
>         at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>         at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
>         at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>         at 
> org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
>         at 
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
> {noformat}
> At this stage, there shouldn't be a DPP sink whose target map work is null. 
> The root cause seems to be a malformed operator tree generated by 
> SplitOpTreeForDPP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to