Jim Challenger created UIMA-2593:
------------------------------------
Summary: RM: Resource Manager mishandling dead node with Work
Items in Limbo
Key: UIMA-2593
URL: https://issues.apache.org/jira/browse/UIMA-2593
Project: UIMA
Issue Type: Bug
Components: ducc
Reporter: Jim Challenger
Assignee: Jim Challenger
If a node dies with a work-item that is starting but not confirmed so it goes
into Limbo, RM continuously allocates a new node until the pool is exhausted.
Correct behavior is for RM to allocate only sufficient nodes to make up for the
dead one, based on remaining work.
To reproduce, start a small cluster and fire off a job with a couple hundred
short (5-10 second) work items. Once all nodes are full issue SIGSTOP to one
agent and JP. This should cause at least one WI to go into limbo. When the
heartbeat counter says the node is dead we expect to see the errant behavior
start.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira