[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824411#comment-15824411 ] Erick Erickson commented on SOLR-9906: -- Beasting after this latest push succeeded 100 times out of 100. Prior it failed for me 21/100 times. > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824147#comment-15824147 ] ASF subversion and git services commented on SOLR-9906: --- Commit efc7ee0f0c9154fe58671601fdc053540c97ff62 in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=efc7ee0 ] SOLR-9906: Fix dodgy test check > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824146#comment-15824146 ] ASF subversion and git services commented on SOLR-9906: --- Commit e13a6fa078890c3f3e0d9cebb1bf3329d94e46a6 in lucene-solr's branch refs/heads/branch_6x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e13a6fa ] SOLR-9906: Fix dodgy test check > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824145#comment-15824145 ] ASF subversion and git services commented on SOLR-9906: --- Commit 3795c997257868b66306a2c105f095f8a82326c7 in lucene-solr's branch refs/heads/branch_6_4 from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3795c99 ] SOLR-9906: Fix dodgy test check > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824113#comment-15824113 ] Alan Woodward commented on SOLR-9906: - Yes to both - don't worry about a patch, I'll make the change and push it. > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824089#comment-15824089 ] Pushkar Raste commented on SOLR-9906: - [~romseygeek] - Thank you for catch the bug. I think check can be fixed by changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() == Replica.State.ACTIVE}} Let me know if that is correct and I will attach a patch to fix it (Not sure if I have attach patch for this issue in entirety or just the patch to fix the slice vs replica state. What do you mean by log message is badly setup? > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823692#comment-15823692 ] Alan Woodward commented on SOLR-9906: - This is causing lots of failures in PeerSyncReplicationTest. I think AbstractDistribZkTestBase.waitForNewLeader() is buggy - the check that a new leader is active is looking at the slice state, not the prospective leader's replica state, plus the log message is badly set up. > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795845#comment-15795845 ] ASF subversion and git services commented on SOLR-9906: --- Commit 812070a77f483149e1d83b3d1bbc7ba80f0fd868 in lucene-solr's branch refs/heads/branch_6x from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=812070a ] SOLR-9906-Use better check to validate if node recovered via PeerSync or Replication > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794681#comment-15794681 ] ASF subversion and git services commented on SOLR-9906: --- Commit d5652385675d12b80a58e44a8c8b392c9f70a334 in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d565238 ] SOLR-9906: unused import > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15794381#comment-15794381 ] ASF subversion and git services commented on SOLR-9906: --- Commit 3988532d26a50b1f3cf51e1d0009a0754cfd6b57 in lucene-solr's branch refs/heads/master from [~noble.paul] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3988532 ] SOLR-9906-Use better check to validate if node recovered via PeerSync or Replication > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787382#comment-15787382 ] Noble Paul commented on SOLR-9906: -- {{Thread.sleep(3000)}} in {{PeerSyncReplicationTest.forceNodeFailures}} need to go. uncoditional waits are pretty bad > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org