Martin Serrano created TWILL-213:
------------------------------------
Summary: Increase of instances while starting up may lead to
ignored retries and instance increases
Key: TWILL-213
URL: https://issues.apache.org/jira/browse/TWILL-213
Project: Apache Twill
Issue Type: Bug
Components: yarn
Affects Versions: 0.9.0
Reporter: Martin Serrano
As seen in the test development for TWILL-181, if the number of instances for a
container is increased before the `ApplicationMasterService` has observed the
original request as being satisfied, the instance increase and any subsequent
retries will be blocked. This is because in `launchRunnable`:
{code}
if (expectedContainers.getExpected(runnableName) ==
runningContainers.count(runnableName) ||
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
{
provisioning.poll();
}
{code}
we are comparing the expected containers to the running count to decide if
`provisioning.poll()` should be called. If a new instance request has been
made, the expected containers will have been updated and the running count
never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method
can be used to reproduce this case intermittently by changing the
`allRunning.await` check to something that does a countdown latch `onRunning`
as `EchoServerTestRun` does.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)