Andrzej Bialecki created SOLR-12479: ----------------------------------------
Summary: TriggerAction failures may cause inconsistent trigger behavior Key: SOLR-12479 URL: https://issues.apache.org/jira/browse/SOLR-12479 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: AutoScaling Affects Versions: 7.4, master (8.0) Reporter: Andrzej Bialecki The following issue occasionally appears when running {{TestLargeCluster.testNodeLost}}. The test kills a large number of nodes, waiting for a certain time between the kills. Depending on the sequence and the length of {{waitFor}} it may happen that when {{ExecutePlanAction}} processes MOVEREPLICA the target node may just have been killed. This results in an exception and a FAILED status of the action. However, this failure is not reported back to the trigger as unprocessed event because it happens asynchronously in the action executor (in {{ScheduledTriggers}}) - so the trigger happily resets its internal state to no longer track the lost node. As a result, replicas remain lost and even if there’s a Policy violation the event will not be generated again, and the number of replicas won’t go back to the original number. Also, {{ScheduledTriggers:311}} and 323 only logs the exception but doesn’t fire listeners with FAILED status, which is a bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org