[
https://issues.apache.org/jira/browse/TWILL-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Terence Yim resolved TWILL-211.
-------------------------------
Resolution: Fixed
> Retries of failed runnable instances may result in unsatisfiable provisioning
> requests
> --------------------------------------------------------------------------------------
>
> Key: TWILL-211
> URL: https://issues.apache.org/jira/browse/TWILL-211
> Project: Apache Twill
> Issue Type: Bug
> Components: core
> Affects Versions: 0.9.0
> Reporter: Martin Serrano
> Assignee: Martin Serrano
> Priority: Critical
> Fix For: 0.10.0
>
>
> In my investigation into the intermittent failures of tests for TWILL-181 I
> discovered this bug. This code (starting on line 703 of
> ApplicationMasterService):
> {code}
> if (expectedContainers.getExpected(runnableName) ==
> runningContainers.count(runnableName) ||
>
> provisioning.peek().getType().equals(AllocationSpecification.Type.ALLOCATE_ONE_INSTANCE_AT_A_TIME))
> {
> provisioning.poll();
> }
> {code}
> There is a case when instances are failing (but not simultaneously) where the
> retries for the instances will be spread over two invocations of
> `ApplicationMasterService.handleCompleted`. This means they will be part of
> separate `RunnableContainerRequests` and thus will be provisioned separately.
> But because the code above does not anticipate this case, the first
> provisionRequest will never appear to be satisfied, never be polled and the
> total can never be met.
> The first provisionRequest does not appear to be satisfied because the
> expected containers will never equal the running containers. The code as-is
> expects every request to be an `ALLOCATE_ONE_INSTANCE_AT_A_TIME` or for all
> instances. In the case of retries, requests may can in all at once or in
> other patterns which result in multiple provision requests.
> When retrying instances, the code should set the type to be
> `ALLOCATE_ONE_INSTANCE_AT_A_TIME` to reflect the situation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)