william zhu created HIVE-6831:
---------------------------------
Summary: The job schedule in condition task could not be correct
with skewed join optimization
Key: HIVE-6831
URL: https://issues.apache.org/jira/browse/HIVE-6831
Project: Hive
Issue Type: Bug
Components: Query Processor
Affects Versions: 0.11.0
Environment: Hive 0.11.0
Reporter: william zhu
Code isnippet in ConditionalTask.java we can see :
// resolved task
if (driverContext.addToRunnable(tsk)) {
console.printInfo(tsk.getId() + " is selected by condition
resolver.");
}
The selected task be added into the runnable queue immediately without any
dependency checking. If the selected task is original task ,and its parent task
is not being executed, then the result will be incorrect.
Like this:
1. Before skew join optimization:
Step1 ,Step 2 <-- step 3 ( Step1 and Step2 is Step 3's parent)
2. after skew join optimization:
Step1 <- Step4 (ConditionTask)<- consists of [Step3,Step10]
Step2 <- Step5 (ConditionTask)<- consists of [Step3,Step11]
3. Runing
Step3 is selected in Step4 and Step5
Step3 will be execute immediately after Step4 , its not correct.
Step3 will be execute after Step5 again, its not correct either.
4. The correct scheduler is that step3 will be execute after step4 and step5.
5. So, I add a checking operate in the snippet as bellow:
if (!driverContext.getRunnable().contains(tsk)) {
console.printInfo(tsk.getId() + " is selected by condition
resolver.");
if(DriverContext.isLaunchable(tsk)){
// run the original task now
driverContext.addToRunnable(tsk);
}
}
So , that is work right for me in my enviroment. I am not sure whether it will
has some problems in someother condition.
--
This message was sent by Atlassian JIRA
(v6.2#6252)