[jira] [Updated] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization

william zhu (JIRA) Thu, 03 Apr 2014 05:56:29 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


william zhu updated HIVE-6831:
------------------------------

    Description: 
Code snippet in  ConditionalTask.java as bellow:  

    // resolved task
        if (driverContext.addToRunnable(tsk)) {
          console.printInfo(tsk.getId() + " is selected by condition 
resolver.");
    }

The selected task is added into the runnable queue immediately without any 
dependency checking. If the selected task is original task ,and its parent task 
is not being executed, then the result will be incorrect.

Like this:
1. Before skew join optimization:
Step1 ,Step 2 <-- step 3   ( Step1 and Step2 is Step 3's parent)
2. after skew join optimization:
Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]    
Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
3. Runing
Step3 is selected in Step4 and Step5
Step3 will be execute immediately after Step4 , its not correct.
Step3 will be execute after Step5 again, its not correct either.
4. The correct scheduler is that step3 will be execute after step4 and step5.
5. So, I add a checking operate in the snippet  as bellow:

        if (!driverContext.getRunnable().contains(tsk)) {
          console.printInfo(tsk.getId() + " is selected by condition 
resolver.");
          if(DriverContext.isLaunchable(tsk)){
                 // run the original task now
                  driverContext.addToRunnable(tsk);
          }
        }

So , that is work right for me in my enviroment. I am not sure whether it will 
has some problems  in someother condition. 




  was:
Code snippet in  ConditionalTask.java as bellow:  

    // resolved task
        if (driverContext.addToRunnable(tsk)) {
          console.printInfo(tsk.getId() + " is selected by condition 
resolver.");
    }

The selected task be added into the runnable queue immediately without any 
dependency checking. If the selected task is original task ,and its parent task 
is not being executed, then the result will be incorrect.

Like this:
1. Before skew join optimization:
Step1 ,Step 2 <-- step 3   ( Step1 and Step2 is Step 3's parent)
2. after skew join optimization:
Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]    
Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
3. Runing
Step3 is selected in Step4 and Step5
Step3 will be execute immediately after Step4 , its not correct.
Step3 will be execute after Step5 again, its not correct either.
4. The correct scheduler is that step3 will be execute after step4 and step5.
5. So, I add a checking operate in the snippet  as bellow:

        if (!driverContext.getRunnable().contains(tsk)) {
          console.printInfo(tsk.getId() + " is selected by condition 
resolver.");
          if(DriverContext.isLaunchable(tsk)){
                 // run the original task now
                  driverContext.addToRunnable(tsk);
          }
        }

So , that is work right for me in my enviroment. I am not sure whether it will 
has some problems  in someother condition. 





> The job schedule in condition task could not be correct with skewed join 
> optimization
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-6831
>                 URL: https://issues.apache.org/jira/browse/HIVE-6831
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.11.0
>         Environment: Hive 0.11.0
>            Reporter: william zhu
>         Attachments: 6831.patch
>
>
> Code snippet in  ConditionalTask.java as bellow:  
>     // resolved task
>         if (driverContext.addToRunnable(tsk)) {
>           console.printInfo(tsk.getId() + " is selected by condition 
> resolver.");
>     }
> The selected task is added into the runnable queue immediately without any 
> dependency checking. If the selected task is original task ,and its parent 
> task is not being executed, then the result will be incorrect.
> Like this:
> 1. Before skew join optimization:
> Step1 ,Step 2 <-- step 3   ( Step1 and Step2 is Step 3's parent)
> 2. after skew join optimization:
> Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]    
> Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
> 3. Runing
> Step3 is selected in Step4 and Step5
> Step3 will be execute immediately after Step4 , its not correct.
> Step3 will be execute after Step5 again, its not correct either.
> 4. The correct scheduler is that step3 will be execute after step4 and step5.
> 5. So, I add a checking operate in the snippet  as bellow:
>         if (!driverContext.getRunnable().contains(tsk)) {
>           console.printInfo(tsk.getId() + " is selected by condition 
> resolver.");
>           if(DriverContext.isLaunchable(tsk)){
>                // run the original task now
>                 driverContext.addToRunnable(tsk);
>           }
>         }
> So , that is work right for me in my enviroment. I am not sure whether it 
> will has some problems  in someother condition. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6831) The job schedule in condition task could not be correct with skewed join optimization

Reply via email to