[ 
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100617#comment-16100617
 ] 

Jason Lowe commented on TEZ-3803:
---------------------------------

Thanks for the patch!

It would be nice if the progress interval was derived from the progress 
timeout, otherwise it would be easy to configure an invalid setup (e.g.: 
progress timeout < shuffle progress interval).  One's an AM config and the 
other a runtime config, so not sure how easy it would be to do this in practice.

This change doesn't quite do what the original code did due to operator 
precedence.  For example, if numCompletedInputs >= numInputs then the old code 
would never wait, but in this case it could if runningFetchers.size() >= 
numFetchers.  Same issue in ShuffleScheduler.
{noformat}
-          if (runningFetchers.size() >= numFetchers || pendingHosts.isEmpty()) 
{
-            if (numCompletedInputs.get() < numInputs) {
-              wakeLoop.await();
-            }
+          while (runningFetchers.size() >= numFetchers || 
pendingHosts.isEmpty()
+              && numCompletedInputs.get() < numInputs) {
+            inputContext.notifyProgress();
+            wakeLoop.await(waitTime, TimeUnit.MILLISECONDS);
{noformat}

I'm not clear on why we need a background thread to call notifyProgress.  It 
seems sufficient to just have the caller waiting for the shuffle to complete to 
periodically ping progress, which is what MapReduce does.

waitAndNotifyProgress doesn't need to be public.

The unit tests are pretty expensive at 10+ seconds to run.

> Tasks can get killed due to insufficient progress while waiting for shuffle 
> inputs to complete
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3803
>                 URL: https://issues.apache.org/jira/browse/TEZ-3803
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Critical
>         Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch
>
>
> In a scenario where a downstream task has no slow start and gets started 
> before all its shuffle inputs are done, the task can timeout as the wait does 
> not notify progress( set the "progress is being made bit") like it does in 
> MapReduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to