[
https://issues.apache.org/jira/browse/TWILL-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Serrano updated TWILL-213:
---------------------------------
Description:
As seen in the test development for TWILL-181, if the number of instances for a
container is increased before the {{ApplicationMasterService}} has observed the
original request as being satisfied, the instance increase and any subsequent
retries will be blocked. This is because in {{launchRunnable}}:
{code}
if (expectedContainers.getExpected(runnableName) ==
runningContainers.count(runnableName) ||
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
{
provisioning.poll();
}
{code}
we are comparing the expected containers to the running count to decide if
{{provisioning.poll()}} should be called. If a new instance request has been
made, the expected containers will have been updated and the running count
never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}} method
can be used to reproduce this case intermittently by changing the
{{allRunning.await}} check to something that does a countdown latch
{{onRunning}} as {{EchoServerTestRun}} does.
was:
As seen in the test development for TWILL-181, if the number of instances for a
container is increased before the `ApplicationMasterService` has observed the
original request as being satisfied, the instance increase and any subsequent
retries will be blocked. This is because in `launchRunnable`:
{code}
if (expectedContainers.getExpected(runnableName) ==
runningContainers.count(runnableName) ||
provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
{
provisioning.poll();
}
{code}
we are comparing the expected containers to the running count to decide if
`provisioning.poll()` should be called. If a new instance request has been
made, the expected containers will have been updated and the running count
never will. The `MaxRetriesTestRun.maxRetriesWithIncreasedInstances` method
can be used to reproduce this case intermittently by changing the
`allRunning.await` check to something that does a countdown latch `onRunning`
as `EchoServerTestRun` does.
> Increase of instances while starting up may lead to ignored retries and
> instance increases
> ------------------------------------------------------------------------------------------
>
> Key: TWILL-213
> URL: https://issues.apache.org/jira/browse/TWILL-213
> Project: Apache Twill
> Issue Type: Bug
> Components: yarn
> Affects Versions: 0.9.0
> Reporter: Martin Serrano
>
> As seen in the test development for TWILL-181, if the number of instances for
> a container is increased before the {{ApplicationMasterService}} has observed
> the original request as being satisfied, the instance increase and any
> subsequent retries will be blocked. This is because in {{launchRunnable}}:
> {code}
> if (expectedContainers.getExpected(runnableName) ==
> runningContainers.count(runnableName) ||
>
> provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
> {
> provisioning.poll();
> }
> {code}
> we are comparing the expected containers to the running count to decide if
> {{provisioning.poll()}} should be called. If a new instance request has
> been made, the expected containers will have been updated and the running
> count never will. The {{MaxRetriesTestRun.maxRetriesWithIncreasedInstances}}
> method can be used to reproduce this case intermittently by changing the
> {{allRunning.await}} check to something that does a countdown latch
> {{onRunning}} as {{EchoServerTestRun}} does.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)