zhangheihei opened a new pull request #2189:
URL: https://github.com/apache/hive/pull/2189


   **Hive task job will gen duplicate data cause of same task resubmission**
   ```
   2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since 
there's no reduce operator
   2021-04-05 06:05:52 CONSOLE# Launching Job 5 out of 4
   2021-04-05 06:05:52 CONSOLE# Number of reduce tasks is set to 0 since 
there's no reduce operator
   ```
   <img 
src="https://user-images.githubusercontent.com/13237066/115213523-2d945800-a134-11eb-94c3-52095c748283.png";
 width="300" height="300">
   For example,  hive sql explain 4 task. when hive.exec.parallel=true and 
task2/task3 is canExecuteInParallel,task4 will execute 2 times;
   
   1.  task1 is FINISHED, task2/task3 enter Runnable queue
   <img 
src="https://user-images.githubusercontent.com/13237066/115233371-65a69580-a14a-11eb-81fb-5a0c3582e3dc.png";
 width="400" height="150">
   2. task2/task3 is executed in parallel and ends at the same time. Now 
task2/task3 is FINISHED
   <img 
src="https://user-images.githubusercontent.com/13237066/115233876-06955080-a14b-11eb-9570-7334eff8dcad.png";
 width="400" height="150">
   3. task2 removed from running queue, task4 will enter runnable queue
   4. 
   4. 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to