[ https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Serrano updated TWILL-213: --------------------------------- Description: As seen in the test development for TWILL-181, if the number of instances for a container is increased before the {{ApplicationMasterService}} has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in {{launchRunnable}}: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if {{provisioning.poll()}} should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method can be used to reproduce this case intermittently by changing the {{allRunning.await}} check to something that does a countdown latch {{onRunning}} as {{EchoServerTestRun}} does. was: As seen in the test development for TWILL-181, if the number of instances for a container is increased before the `ApplicationMasterService` has observed the original request as being satisfied, the instance increase and any subsequent retries will be blocked. This is because in `launchRunnable`: {code} if (expectedContainers.getExpected(runnableName) == runningContainers.count(runnableName) || provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) { provisioning.poll(); } {code} we are comparing the expected containers to the running count to decide if `provisioning.poll()` should be called. If a new instance request has been made, the expected containers will have been updated and the running count never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method can be used to reproduce this case intermittently by changing the `allRunning.await` check to something that does a countdown latch `onRunning` as `EchoServerTestRun` does. > Increase of instances while starting up may lead to ignored retries and > instance increases > ------------------------------------------------------------------------------------------ > > Key: TWILL-213 > URL: https://issues.apache.org/jira/browse/TWILL-213 > Project: Apache Twill > Issue Type: Bug > Components: yarn > Affects Versions: 0.9.0 > Reporter: Martin Serrano > > As seen in the test development for TWILL-181, if the number of instances for > a container is increased before the {{ApplicationMasterService}} has observed > the original request as being satisfied, the instance increase and any > subsequent retries will be blocked. This is because in {{launchRunnable}}: > {code} > if (expectedContainers.getExpected(runnableName) == > runningContainers.count(runnableName) || > > provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME)) > { > provisioning.poll(); > } > {code} > we are comparing the expected containers to the running count to decide if > {{provisioning.poll()}} should be called. If a new instance request has > been made, the expected containers will have been updated and the running > count never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} > method can be used to reproduce this case intermittently by changing the > {{allRunning.await}} check to something that does a countdown latch > {{onRunning}} as {{EchoServerTestRun}} does. -- This message was sent by Atlassian JIRA (v6.3.15#6346)