Jim Challenger created UIMA-2593:
------------------------------------

             Summary: RM: Resource Manager mishandling dead node with Work 
Items in Limbo
                 Key: UIMA-2593
                 URL: https://issues.apache.org/jira/browse/UIMA-2593
             Project: UIMA
          Issue Type: Bug
          Components: ducc
            Reporter: Jim Challenger
            Assignee: Jim Challenger


If a node dies with a work-item that is starting but not confirmed so it goes 
into Limbo, RM continuously allocates a new node until the pool is exhausted.

Correct behavior is for RM to allocate only sufficient nodes to make up for the 
dead one, based on remaining work.

To reproduce, start a small cluster and fire off a job with a couple hundred 
short (5-10 second) work items.  Once all nodes are full issue SIGSTOP to one 
agent and JP.  This should cause at least one WI to go into limbo.  When the 
heartbeat counter says the node is dead we expect to see the errant behavior 
start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to