[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781086#comment-16781086 ] ASF subversion and git services commented on SOLR-13189: Commit 776013c52e58401c517b4bdd388a488520b84eb2 in lucene-solr's branch refs/heads/branch_7x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=776013c ] disable TestInjection in RestartWhileUpdatingTest work around for SOLR-13189 and SOLR-13212 (cherry picked from commit 956772b7ef6849ba701ecde8610cc0cc523676ff) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, > SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763221#comment-16763221 ] ASF subversion and git services commented on SOLR-13189: Commit 956772b7ef6849ba701ecde8610cc0cc523676ff in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=956772b ] disable TestInjection in RestartWhileUpdatingTest work around for SOLR-13189 and SOLR-13212 > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, > SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763219#comment-16763219 ] ASF subversion and git services commented on SOLR-13189: Commit 2d48bde21bfb69b897632ca2885a61583c659594 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d48bde ] disable TestInjection in RestartWhileUpdatingTest work around for SOLR-13189 and SOLR-13212 (cherry picked from commit 956772b7ef6849ba701ecde8610cc0cc523676ff) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, > SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759155#comment-16759155 ] Mark Miller commented on SOLR-13189: To try and be extra clear. My patch is intended to prove to you that my theory is correct and that following the system rules allows this test to pass with fails injected. By coincidence, my patch does something we need to start doing - change our old style clustered verification test methods to work with new style tests to reduce duplication and move old style test to the new style tests. We should inject random fails, but only in specific tests that check things like my patch does. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758826#comment-16758826 ] Mark Miller commented on SOLR-13189: {quote}i guess i was just hoping for a less complicated {quote} I give the least complicated way: {quote}More practically, the changed behavior mostly affects us injecting fails. That type of test should be isolated and have correct checking. For the rest of the tests, we probably don't expect fails and so failing if we have them seems fine, something likely needs to be fixed or you are checking wrong. {quote} We should only inject fails on tests specifically designed for that, not generally across tests. That should have worked with the http recovery call, but it doesn't anymore. Also, while that patch is a hack, it's also towards the direction we need to move anyway. We need to change all the old style Solr cloud tests to work how I changed that check consistency method (it just needs to be done in non hacking way). Then we can move all those tests to the more modern solrcloud test base class. The main thing stopping that has been our use of those Jetty instance maps - we need to drop that stuff. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758720#comment-16758720 ] Hoss Man commented on SOLR-13189: - bq. markmiller: Here is a hack to that test. yeah, fair enough -- sorry, I wasn't trying to be dismissive of your help, ... i guess i was just hoping for a less complicated (from the perspective of test writers) solution that we could show case as the gold standard of how to (generically) "wait for recovery" after (potentially) injecting failures ... but i'm not in a rush to re-add TestInjection back into TestStressCloudBlindAtomicUpdates -- it's a "nice to have" but not something I care about enough to get over my general feeling of ickiness at needing call {{Thread.sleep}} in a loop that much : ) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758625#comment-16758625 ] Mark Miller commented on SOLR-13189: {quote} * was bad in real life because if replica was having problems, it might not recognize/respond to LIR apprpriate{quote} It was fine from that perspective when Tim added LIR - the original communication through ZK. The problem was that it was tied to each update before, so if you had lots of fails, you would make tons of http calls and tons of requests to recover (we throttle recoveries now to prevent this type of thing). So that either needed to be removed, or made more efficient by not linking every http call to a document fail. I think it's been removed or else it's broken. bq. this is good in real life because it's less dependent on healthy network/http requests We already had ZK based LIR on top of the http request attempt. I think the rewritten improved LIR removed (rather than making efficient) or broke the request attempt. bq. this is bad in tests because there is an inherent and hard to predict delay the replica even realizes it needs to go into recovery It depends on the test. If you don't want flakey tests, all of them should obey the rules of the system when checking things as much as possible. More practically, the changed behavior mostly affects us injecting fails. That type of test should be isolated and have correct checking. For the rest of the tests, we probably don't expect fails and so failing if we have them seems fine, something likely needs to be fixed or you are checking wrong. bq. I haven't dug into your patch that deep, but so far is seems really hackish? markmiller: Here is a hack to that test. This is just to fix your test. bq. it makes the test wait (or timeout) until it is consistent If you want to write a test like that, those are the rules, so that is what it does. Recovery can be re-triggered and stuff can happen that will take a consistent state longer than you might think it should take. So either your test is not creating the env you think it is, or it is, and this is how you properly test it. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758564#comment-16758564 ] Hoss Man commented on SOLR-13189: - {quote}In older versions these tests might have worked because before the request returns to the client, the leader would have called to the replica and told it to go into recovery. I believe we no longer make these calls (for good reason, http calls tied to updates was no good). So a replica will only enter recovery when it realizes it should via ZooKeeper communication. {quote} Ok ... so to re-iterate and make sure i'm following everything: * OLD LIR: ** LIR was pushed to replica ia HTTP immediately after replica returned non-200 status ** was bad in real life because if replica was having problems, it might not recognize/respond to LIR apprpriate ** was good in tests because it ment immediately after doing an index update, you could {{waitForRecoveriesToFinish}} and the replica would already be in recover * CURRENT LIR: ** LIR status is managed via flags in ZK (this is the "terms" concept correct?) ** replicas monitor ZK to see if/when they need to go into LIR ** this is good in real life because it's less dependent on healthy network/http requests ** this is bad in tests because there is an inherent and hard to predict delay the replica even realizes it needs to go into recovery *** ie: {{waitForRecoveriesToFinish}} now seems completley useless? does that cover it? {quote}The system will be eventually consistent, but there is no promise it will be consistent even when all replicas are active. You must be willing to wait a short time for consistency and this test does not. {quote} Right ... i understand that ... the question at the heart of this jira is what a test can/should do to know "the system should now be consistent enough for me to make the assertions I want to make" (and how do we make that as easy as possible for tests to do). I haven't dug into your patch that deep, but so far is seems really hackish? ... sleep looping until all the replicas are live the first 1000 docs from a {{*:*}} of a query to each matches each other? If nothing else this creates a (slow) chicken and egg diagnoses problem in tests – did {{waitForConsistency}} eventually time out because the recovery is broken, or because the code i'm writting a test for (example: distributed atomic updates) is broken? I'm not saying the {{checkConsistency}} logic is bad – if anything it seems like something that might be good to have in the tear down of every test – but I'm concerned that just trying to do a "wait for" on it doesn't really get to the heart of the problem of tests being able to know when the cluster *_should_* be consistent – it makes the test wait (or timeout) until it *_is_* consistent) If recovery is driven by these flags in ZK, then why couldn't we re-write {{waitForRecoveriesToFinish}} to check those flags first (in addition to the {{Replica.State}}) to know if recovery is pending (or in progress) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757955#comment-16757955 ] Mark Miller commented on SOLR-13189: Whoops, waiting for consistency isn't enough, you also have to wait for the right total doc count. Updated patch. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757947#comment-16757947 ] Mark Miller commented on SOLR-13189: Here is a hack to that test. If we want to handle any valid case when checking counts in a test, we have to do like the ChaosMonkey tests have always done and wait for consistency explicitly. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch, SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757921#comment-16757921 ] Mark Miller commented on SOLR-13189: Basically another example in a long line of someone introducing or changing a feature and causing massive new instability. I still intend to tackle that problem fully and concrete plans and work already done, but I've got some side gigs too. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757918#comment-16757918 ] Mark Miller commented on SOLR-13189: And it was just starting to feel good being away again ... As an aside, that wait for recoveries call should be nixed because it's flakey after a collection create call. We need to use wait calls that specify the shards and replicas to wait for like the SolrCloudTest tests do now. What I would guess is happening here is that you are hitting the eventual consistency nature of the system. In older versions these tests might have worked because before the request returns to the client, the leader would have called to the replica and told it to go into recovery. I believe we no longer make these calls (for good reason, http calls tied to updates was no good). So a replica will only enter recovery when it realizes it should via ZooKeeper communication. The system will be eventually consistent, but there is no promise it will be consistent even when all replicas are active. You must be willing to wait a short time for consistency and this test does not. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757551#comment-16757551 ] Hoss Man commented on SOLR-13189: - [~markrmil...@gmail.com] - any guidance/observations here to help me proceed? > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755663#comment-16755663 ] ASF subversion and git services commented on SOLR-13189: Commit 73cfa810c7fcf8e5299a6b9c2fcecceee44d2846 in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=73cfa81 ] disable TestInjection in TestStressCloudBlindAtomicUpdates work around for SOLR-13189 (cherry picked from commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755664#comment-16755664 ] ASF subversion and git services commented on SOLR-13189: Commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0a01b9e ] disable TestInjection in TestStressCloudBlindAtomicUpdates work around for SOLR-13189 > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755662#comment-16755662 ] ASF subversion and git services commented on SOLR-13189: Commit 21d2b024f4590175f97b82839ff69f96bd022df2 in lucene-solr's branch refs/heads/branch_7x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=21d2b02 ] disable TestInjection in TestStressCloudBlindAtomicUpdates work around for SOLR-13189 (cherry picked from commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74) > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests
[ https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755410#comment-16755410 ] Hoss Man commented on SOLR-13189: - {quote}As currently written, this test will fail very easily... {quote} To clarify, the test as _uploaded_ already has the TestInjection line commented out with a {{nocommit}} ... so it should reliably pass for anyone. remove nocommit and allow the {{TestInjection.failReplicaRequests}} to beset, and it should start failing very easily. > Need reliable example (Test) of how to use TestInjection.failReplicaRequests > > > Key: SOLR-13189 > URL: https://issues.apache.org/jira/browse/SOLR-13189 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Priority: Major > Attachments: SOLR-13189.patch > > > We need a test that reliably demonstrates the usage of > {{TestInjection.failReplicaRequests}} and shows what steps a test needs to > take after issuing updates to reliably "pass" (finding all index updates that > succeeded from the clients perspective) even in the event of an (injected) > replica failure. > As things stand now, it does not seem that any test using > {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear > if this is due to poorly designed tests, or an indication of a bug in > distributed updates / LIR* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org