[
https://issues.apache.org/jira/browse/SOLR-17656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933472#comment-17933472
]
Houston Putman commented on SOLR-17656:
---------------------------------------
I think this inadvertently created a bug that when {{testRealTimeGet}} runs
after the new {{testSkipLeaderRecoveryProperty}} it can hit:
{quote}{{> org.apache.solr.client.solrj.SolrServerException: IOException
occurred when talking to server at: [https://127.0.0.1:42033/solr]}}
{{> at
__randomizedtesting.SeedInfo.seed([BBB06E01532385D3:E3DD9B02F1992B1A]:0)}}
{{> at
app//org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:736)}}
{{> at
app//org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:260)}}
{{> at
app//org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:239)}}
{{> at
app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:279)}}
{{> at app//org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:166)}}
{{> at app//org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:131)}}
{{> at app//org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:147)}}
{{> at
app//org.apache.solr.cloud.TestPullReplica.testRealTimeGet(TestPullReplica.java:497)}}
{quote}
{{I'll see if I can fix it quickly.}}
> Add expert level option to allowe PULL replicas to go ACTIVE w/o RECOVERING
> ---------------------------------------------------------------------------
>
> Key: SOLR-17656
> URL: https://issues.apache.org/jira/browse/SOLR-17656
> Project: Solr
> Issue Type: New Feature
> Reporter: Chris M. Hostetter
> Assignee: Chris M. Hostetter
> Priority: Major
> Fix For: main (10.0), 9.9
>
> Attachments: SOLR-17656-1.patch, SOLR-17656.patch
>
>
> In situations where a Solr cluster undergoes a rolling restart (or some other
> "catastrophic" failure situations requiring/causing solr node restarts) there
> can be a snowball effect of poor performance (or even solr node crashing) due
> to fewer then normal replicas serving query requests while replicas on
> restarting nodes are DOWN or RECOVERING – especially if shard leaders are
> also affected, and (restarting) replicas first must wait for a leader
> election before they can recover (or wait to finish recovery from an
> over-worked leader).
> For NRT type usecases, RECOVERING is really a necessary evil to ensure every
> replicas is up to date before handling NRT requests – but in the case of PULL
> replicas, which are expected to routinely "lag" behind their leader, I've
> talked to a lot of Solr users w/usecases where they would be happy to have
> PULL replicas back online serving "stale" data ASAP, and let normal
> IndexFetching "catchup" with the leader later.
> I propose we support a new "advanced" replica property that can be set on
> PULL replicas by expert level users, to indicate: on (re)init, these replicas
> may skip RECOVERING and go directly to ACTIVE.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]