[ https://issues.apache.org/jira/browse/SOLR-13376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810887#comment-16810887 ]
Andrzej Bialecki commented on SOLR-13376: ------------------------------------------ Hmm, indeed there's a race condition here. The reason for having more than 1 node attempt creating a nodeLost marker is that more than 1 node may go away (3 was a magic number ;) that we felt wasn't excessive and still reduced the chance of losing the event due to multiple node failures). This cleaning of leftover markers in {{OverseerTriggerThread}} was added early on when we added this functionality, and it may not be necessary anymore - there's {{InactiveMarkersPlanAction}} that runs periodically to remove stale markers. > Multi-node race condition to create/remove nodeLost markers > ----------------------------------------------------------- > > Key: SOLR-13376 > URL: https://issues.apache.org/jira/browse/SOLR-13376 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Hoss Man > Assignee: Andrzej Bialecki > Priority: Major > > NodeMarkersRegistrationTest.testNodeMarkersRegistration is frequently failing > on jenkins builds in the same spot, with a similar looking logs. > Although i haven't been able to reproduce these failures locally, I am fairly > confident that the problem is a race condition bug that exists between > when/how a new Overseer will process & clean up "nodeLost" marker's in ZK, > with how other nodes may (mistakenly) re-create those markers in their > liveNodes listener. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org