zhangheihei opened a new pull request #2189: URL: https://github.com/apache/hive/pull/2189
**Hive task job will gen duplicate data cause of same task resubmission** ``` 2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since there's no reduce operator 2021-04-05 06:05:52 CONSOLE# Launching Job 5 out of 4 2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since there's no reduce operator ``` <img src="https://user-images.githubusercontent.com/13237066/115213523-2d945800-a134-11eb-94c3-52095c748283.png" width="300" height="300"> For example, hive sql explain 4 task. when hive.exec.parallel=true and task2/task3 is canExecuteInParallel,task4 will execute 2 times; 1. task1 is FINISHED, task2/task3 enter Runnable queue <img src="https://user-images.githubusercontent.com/13237066/115233371-65a69580-a14a-11eb-81fb-5a0c3582e3dc.png" width="400" height="150"> 2. task2/task3 is executed in parallel and ends at the same time. Now task2/task3 is FINISHED <img src="https://user-images.githubusercontent.com/13237066/115233876-06955080-a14b-11eb-9570-7334eff8dcad.png" width="400" height="150"> 3. task2 removed from running queue, task4 will enter runnable queue 4. 4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
