[
https://issues.apache.org/jira/browse/SOLR-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842810#comment-17842810
]
Houston Putman edited comment on SOLR-17088 at 5/2/24 12:50 AM:
----------------------------------------------------------------
Wow this one was a rough one to find!
Luckily it's only breaking on main, and the git history gave us some help since
very few commits aren't backported to 9x.
It turned out to be a very innocuous change to remove the solr.xml capabilities
from ZK: SOLR-16975
Jan did a great job, however there is one option to start nodes after
configuring a cluster that does not take a solr.xml, and it will use the
default solr.xml instead. However for clusters that were setup with a solr.xml
initially, these new nodes will have different solr.xmls from the original
nodes. (this did not use to be a problem since the solr.xml file was in ZK)
The reason why we were seeing these errors is that TestPrepRecovery does not
use the default cloud solr.xml, which has options for setting the
distributedClusterStateUpdate vs using the overseer. These flags are randomized
when setting up the cluster. Since TestPrepRecovery was using its own solr.xml
that didn't support these flags, it was using the overseer. However, when doing
a {{cluster.startJettySolrRunner()}} , the new jetty instances used the default
cloud solr.xml (as stated above) which does support the flags. So these
instances thought they were supposed to be using the distributed state
processing, hence cluster state issues.
To make matters even more complicated, this error cannot be seen when just
running the failing test "testLeaderNotResponding", because this test does not
add any new jetty runners. The test above it "testLeaderUnloaded" does, so the
entire test class needs to be run to see any errors. Amazingly the test that
causes the problems, "testLeaderUnloaded", does not fail because the collection
creation happens before the new jetty runners are started.
Overall easy fix, save the solr.xml after creating a cluster and use it for all
new jetty runners.
was (Author: houston):
Wow this one was a rough one to find!
Luckily it's only breaking on main, and the git history gave us some help since
very few commits aren't backported to 9x.
It turned out to be a very innocuous change to remove the solr.xml capabilities
from ZK: SOLR-16975
Jan did a great job, however there is one option to start nodes after
configuring a cluster that does not take a solr.xml, and it will use the
default solr.xml instead.However for clusters that
> TestPrepRecovery.testLeaderNotResponding fails much more lately
> ---------------------------------------------------------------
>
> Key: SOLR-17088
> URL: https://issues.apache.org/jira/browse/SOLR-17088
> Project: Solr
> Issue Type: Test
> Reporter: David Smiley
> Priority: Minor
> Attachments: 2023-11-27 fail.log.txt
>
>
> I'll attach logs. I didn't try and root cause. [Increased in test frequency
> lately|http://fucit.org/solr-jenkins-reports/history-trend-of-recent-failures.html#series/org.apache.solr.cloud.TestPrepRecovery.testLeaderNotResponding].
> All recent failures happen on main, not 9x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]