[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-28 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781086#comment-16781086
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 776013c52e58401c517b4bdd388a488520b84eb2 in lucene-solr's branch 
refs/heads/branch_7x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=776013c ]

disable TestInjection in RestartWhileUpdatingTest

work around for SOLR-13189 and SOLR-13212

(cherry picked from commit 956772b7ef6849ba701ecde8610cc0cc523676ff)


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, 
> SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763221#comment-16763221
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 956772b7ef6849ba701ecde8610cc0cc523676ff in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=956772b ]

disable TestInjection in RestartWhileUpdatingTest

work around for SOLR-13189 and SOLR-13212


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, 
> SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763219#comment-16763219
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 2d48bde21bfb69b897632ca2885a61583c659594 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2d48bde ]

disable TestInjection in RestartWhileUpdatingTest

work around for SOLR-13189 and SOLR-13212

(cherry picked from commit 956772b7ef6849ba701ecde8610cc0cc523676ff)


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.logs.tgz, SOLR-13189.patch, SOLR-13189.patch, 
> SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-02 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759155#comment-16759155
 ] 

Mark Miller commented on SOLR-13189:


To try and be extra clear.

My patch is intended to prove to you that my theory is correct and that 
following the system rules allows this test to pass with fails injected.

By coincidence, my patch does something we need to start doing - change our old 
style clustered verification test methods to work with new style tests to 
reduce duplication and move old style test to the new style tests.

We should inject random fails, but only in specific tests that check things 
like my patch does.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-01 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758826#comment-16758826
 ] 

Mark Miller commented on SOLR-13189:


{quote}i guess i was just hoping for a less complicated
{quote}
I give the least complicated way:
{quote}More practically, the changed behavior mostly affects us injecting 
fails. That type of test should be isolated and have correct checking. For the 
rest of the tests, we probably don't expect fails and so failing if we have 
them seems fine, something likely needs to be fixed or you are checking wrong.
{quote}
We should only inject fails on tests specifically designed for that, not 
generally across tests. That should have worked with the http recovery call, 
but it doesn't anymore.

Also, while that patch is a hack, it's also towards the direction we need to 
move anyway. We need to change all the old style Solr cloud tests to work how I 
changed that check consistency method (it just needs to be done in non hacking 
way). Then we can move all those tests to the more modern solrcloud test base 
class.

The main thing stopping that has been our use of those Jetty instance maps - we 
need to drop that stuff.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-01 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758720#comment-16758720
 ] 

Hoss Man commented on SOLR-13189:
-

bq. markmiller: Here is a hack to that test.

yeah, fair enough -- sorry, I wasn't trying to be dismissive of your help, ... 
i guess i was just hoping for a less complicated (from the perspective of test 
writers) solution that we could show case as the gold standard of how to 
(generically) "wait for recovery" after (potentially) injecting failures ... 
but i'm not in a rush to re-add TestInjection back into 
TestStressCloudBlindAtomicUpdates -- it's a "nice to have" but not something I 
care about enough to get over my general feeling of ickiness at needing call 
{{Thread.sleep}} in a loop that much : )


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-01 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758625#comment-16758625
 ] 

Mark Miller commented on SOLR-13189:


{quote} * was bad in real life because if replica was having problems, it might 
not recognize/respond to LIR apprpriate{quote}
It was fine from that perspective when Tim added LIR - the original 
communication through ZK. The problem was that it was tied to each update 
before, so if you had lots of fails, you would make tons of http calls and tons 
of requests to recover (we throttle recoveries now to prevent this type of 
thing). So that either needed to be removed, or made more efficient by not 
linking every http call to a document fail. I think it's been removed or else 
it's broken.

bq. this is good in real life because it's less dependent on healthy 
network/http requests

We already had ZK based LIR on top of the http request attempt. I think the 
rewritten improved LIR removed (rather than making efficient) or broke the 
request attempt.

bq. this is bad in tests because there is an inherent and hard to predict delay 
the replica even realizes it needs to go into recovery

It depends on the test. If you don't want flakey tests, all of them should obey 
the rules of the system when checking things as much as possible. More 
practically, the changed behavior mostly affects us injecting fails. That type 
of test should be isolated and have correct checking. For the rest of the 
tests, we probably don't expect fails and so failing if we have them seems 
fine, something likely needs to be fixed or you are checking wrong.

bq. I haven't dug into your patch that deep, but so far is seems really 
hackish? 

markmiller: Here is a hack to that test.

This is just to fix your test.

bq.  it makes the test wait (or timeout) until it is consistent

If you want to write a test like that, those are the rules, so that is what it 
does. Recovery can be re-triggered and stuff can happen that will take a 
consistent state longer than you might think it should take. So either your 
test is not creating the env you think it is, or it is, and this is how you 
properly test it.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-02-01 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16758564#comment-16758564
 ] 

Hoss Man commented on SOLR-13189:
-

{quote}In older versions these tests might have worked because before the 
request returns to the client, the leader would have called to the replica and 
told it to go into recovery. I believe we no longer make these calls (for good 
reason, http calls tied to updates was no good). So a replica will only enter 
recovery when it realizes it should via ZooKeeper communication.
{quote}
Ok ... so to re-iterate and make sure i'm following everything:
 * OLD LIR:
 ** LIR was pushed to replica ia HTTP immediately after replica returned 
non-200 status
 ** was bad in real life because if replica was having problems, it might not 
recognize/respond to LIR apprpriate
 ** was good in tests because it ment immediately after doing an index update, 
you could {{waitForRecoveriesToFinish}} and the replica would already be in 
recover
 * CURRENT LIR:
 ** LIR status is managed via flags in ZK (this is the "terms" concept correct?)
 ** replicas monitor ZK to see if/when they need to go into LIR
 ** this is good in real life because it's less dependent on healthy 
network/http requests
 ** this is bad in tests because there is an inherent and hard to predict delay 
the replica even realizes it needs to go into recovery
 *** ie: {{waitForRecoveriesToFinish}} now seems completley useless?

does that cover it?
{quote}The system will be eventually consistent, but there is no promise it 
will be consistent even when all replicas are active. You must be willing to 
wait a short time for consistency and this test does not.
{quote}
Right ... i understand that ... the question at the heart of this jira is what 
a test can/should do to know "the system should now be consistent enough for me 
to make the assertions I want to make" (and how do we make that as easy as 
possible for tests to do).

I haven't dug into your patch that deep, but so far is seems really hackish? 
... sleep looping until all the replicas are live the first 1000 docs from a 
{{*:*}} of a query to each matches each other?

If nothing else this creates a (slow) chicken and egg diagnoses problem in 
tests – did {{waitForConsistency}} eventually time out because the recovery is 
broken, or because the code i'm writting a test for (example: distributed 
atomic updates) is broken?

I'm not saying the {{checkConsistency}} logic is bad – if anything it seems 
like something that might be good to have in the tear down of every test – but 
I'm concerned that just trying to do a "wait for" on it doesn't really get to 
the heart of the problem of tests being able to know when the cluster 
*_should_* be consistent – it makes the test wait (or timeout) until it *_is_* 
consistent)

If recovery is driven by these flags in ZK, then why couldn't we re-write 
{{waitForRecoveriesToFinish}} to check those flags first (in addition to the 
{{Replica.State}}) to know if recovery is pending (or in progress)

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-31 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757955#comment-16757955
 ] 

Mark Miller commented on SOLR-13189:


Whoops, waiting for consistency isn't enough, you also have to wait for the 
right total doc count. Updated patch.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-31 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757947#comment-16757947
 ] 

Mark Miller commented on SOLR-13189:


Here is a hack to that test.

If we want to handle any valid case when checking counts in a test, we have to 
do like the ChaosMonkey tests have always done and wait for consistency 
explicitly.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch, SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-31 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757921#comment-16757921
 ] 

Mark Miller commented on SOLR-13189:


Basically another example in a long line of someone introducing or changing a 
feature and causing massive new instability.

I still intend to tackle that problem fully and concrete plans and work already 
done, but I've got some side gigs too.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-31 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757918#comment-16757918
 ] 

Mark Miller commented on SOLR-13189:


And it was just starting to feel good being away again ...

As an aside, that wait for recoveries call should be nixed because it's flakey 
after a collection create call. We need to use wait calls that specify the 
shards and replicas to wait for like the SolrCloudTest tests do now.

What I would guess is happening here is that you are hitting the eventual 
consistency nature of the system.

In older versions these tests might have worked because before the request 
returns to the client, the leader would have called to the replica and told it 
to go into recovery. I believe we no longer make these calls (for good reason, 
http calls tied to updates was no good). So a replica will only enter recovery 
when it realizes it should via ZooKeeper communication.

The system will be eventually consistent, but there is no promise it will be 
consistent even when all replicas are active. You must be willing to wait a 
short time for consistency and this test does not.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-31 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757551#comment-16757551
 ] 

Hoss Man commented on SOLR-13189:
-

[~markrmil...@gmail.com] - any guidance/observations here to help me proceed?

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755663#comment-16755663
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 73cfa810c7fcf8e5299a6b9c2fcecceee44d2846 in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=73cfa81 ]

disable TestInjection in TestStressCloudBlindAtomicUpdates

work around for SOLR-13189

(cherry picked from commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74)


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755664#comment-16755664
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74 in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0a01b9e ]

disable TestInjection in TestStressCloudBlindAtomicUpdates

work around for SOLR-13189


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755662#comment-16755662
 ] 

ASF subversion and git services commented on SOLR-13189:


Commit 21d2b024f4590175f97b82839ff69f96bd022df2 in lucene-solr's branch 
refs/heads/branch_7x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=21d2b02 ]

disable TestInjection in TestStressCloudBlindAtomicUpdates

work around for SOLR-13189

(cherry picked from commit 0a01b9e12787e56604aab3a0c3792d2aa060ae74)


> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13189) Need reliable example (Test) of how to use TestInjection.failReplicaRequests

2019-01-29 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755410#comment-16755410
 ] 

Hoss Man commented on SOLR-13189:
-

{quote}As currently written, this test will fail very easily...
{quote}
To clarify, the test as _uploaded_ already has the TestInjection line commented 
out with a {{nocommit}} ... so it should reliably pass for anyone.  remove 
nocommit and allow the {{TestInjection.failReplicaRequests}} to beset, and it 
should start failing very easily.

> Need reliable example (Test) of how to use TestInjection.failReplicaRequests
> 
>
> Key: SOLR-13189
> URL: https://issues.apache.org/jira/browse/SOLR-13189
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Hoss Man
>Priority: Major
> Attachments: SOLR-13189.patch
>
>
> We need a test that reliably demonstrates the usage of 
> {{TestInjection.failReplicaRequests}} and shows what steps a test needs to 
> take after issuing updates to reliably "pass" (finding all index updates that 
> succeeded from the clients perspective) even in the event of an (injected) 
> replica failure.
> As things stand now, it does not seem that any test using 
> {{TestInjection.failReplicaRequests}} passes reliably -- *and it's not clear 
> if this is due to poorly designed tests, or an indication of a bug in 
> distributed updates / LIR*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org