[ 
https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Serrano updated TWILL-213:
---------------------------------
    Description: 
As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the {{ApplicationMasterService}} has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in {{launchRunnable}}:

{code}
      if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||
        
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
        provisioning.poll();
      }
{code}

we are comparing the expected containers to the running count to decide if 
{{provisioning.poll()}} should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method 
can be used to reproduce this case intermittently by changing the 
{{allRunning.await}} check to something that does a countdown latch 
{{onRunning}} as {{EchoServerTestRun}} does.

  was:
As seen in the test development for TWILL-181, if the number of instances for a 
container is increased before the `ApplicationMasterService` has observed the 
original request as being satisfied, the instance increase and any subsequent 
retries will be blocked.  This is because in `launchRunnable`:

{code}
      if (expectedContainers.getExpected(runnableName) == 
runningContainers.count(runnableName) ||
        
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
 {
        provisioning.poll();
      }
{code}

we are comparing the expected containers to the running count to decide if 
`provisioning.poll()` should be called.   If a new instance request has been 
made, the expected containers will have been updated and the running count 
never will.  The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method 
can be used to reproduce this case intermittently by changing the 
`allRunning.await` check to something that does a countdown latch `onRunning` 
as `EchoServerTestRun` does.


> Increase of instances while starting up may lead to ignored retries and 
> instance increases
> ------------------------------------------------------------------------------------------
>
>                 Key: TWILL-213
>                 URL: https://issues.apache.org/jira/browse/TWILL-213
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 0.9.0
>            Reporter: Martin Serrano
>
> As seen in the test development for TWILL-181, if the number of instances for 
> a container is increased before the {{ApplicationMasterService}} has observed 
> the original request as being satisfied, the instance increase and any 
> subsequent retries will be blocked.  This is because in {{launchRunnable}}:
> {code}
>       if (expectedContainers.getExpected(runnableName) == 
> runningContainers.count(runnableName) ||
>         
> provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
>  {
>         provisioning.poll();
>       }
> {code}
> we are comparing the expected containers to the running count to decide if 
> {{provisioning.poll()}} should be called.   If a new instance request has 
> been made, the expected containers will have been updated and the running 
> count never will.  The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} 
> method can be used to reproduce this case intermittently by changing the 
> {{allRunning.await}} check to something that does a countdown latch 
> {{onRunning}} as {{EchoServerTestRun}} does.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to