[
https://issues.apache.org/jira/browse/SOLR-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743281#comment-17743281
]
Houston Putman commented on SOLR-16753:
---------------------------------------
So I'm not sure why the test was passing initially, but you can tell when using
DEBUG logging why this test fails:
{code:java}
2023-07-14 16:26:56 2> 20611 INFO
(recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr
coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10)
[n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10
x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state
shard1_1 to active
2023-07-14 16:26:56 2> 20611 INFO
(recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr
coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10)
[n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10
x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state
shard1_0 to active
2023-07-14 16:26:56 2> 20611 INFO
(recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr
coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10)
[n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10
x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.SliceMutator Update shard state
shard1 to inactive {code}
...
{code:java}
2023-07-14 16:26:56 2> 20612 DEBUG
(recoveryExecutor-28-thread-1-processing-127.0.0.1:37169_solr
coll_NRT_PULL_shard1_0_replica_p1 coll_NRT_PULL shard1_0 core_node10)
[n:127.0.0.1:37169_solr c:coll_NRT_PULL s:shard1_0 r:core_node10
x:coll_NRT_PULL_shard1_0_replica_p1] o.a.s.c.o.ReplicaMutator state.json is not
persisted slice/replica : shard1_0/core_node10
2023-07-14 16:26:56 2> , old : {
2023-07-14 16:26:56 2> "core":"coll_NRT_PULL_shard1_0_replica_p1",
2023-07-14 16:26:56 2> "node_name":"127.0.0.1:37169_solr",
2023-07-14 16:26:56 2> "base_url":"http://127.0.0.1:37169/solr",
2023-07-14 16:26:56 2> "state":"down",
2023-07-14 16:26:56 2> "type":"PULL",
2023-07-14 16:26:56 2> "force_set_state":"false"},
2023-07-14 16:26:56 2> new {
2023-07-14 16:26:56 2> "core":"coll_NRT_PULL_shard1_0_replica_p1",
2023-07-14 16:26:56 2> "node_name":"127.0.0.1:37169_solr",
2023-07-14 16:26:56 2> "base_url":"http://127.0.0.1:37169/solr",
2023-07-14 16:26:56 2> "state":"active",
2023-07-14 16:26:56 2> "type":"PULL",
2023-07-14 16:26:56 2> "force_set_state":"false"} {code}
Basically, the replica state mutation causes the shards to become active. I
don't really understand {{{}ReplicaMutator.persistStateJson(){}}}, but for some
reason it's only choosing to update the {{state.json}} if the slice's state is
"recovering"... Clearly the state needs to be updated if it changes from
"recovering" to "active". So with this method returning false, the
{{state.json}} is never updated with the active slices.
I'm going to make a PR, but the fix is pretty simple. Update the state.json if
the slice's state changes.
> SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull failures
> -----------------------------------------------------------------------
>
> Key: SOLR-16753
> URL: https://issues.apache.org/jira/browse/SOLR-16753
> Project: Solr
> Issue Type: Test
> Reporter: Chris M. Hostetter
> Assignee: Noble Paul
> Priority: Major
> Attachments: SOLR-16753.txt, Skjermbilde 2023-05-03 kl. 12.24.56.png
>
>
> {{SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull}} – was
> added on 2023-03-13, but somwhere between 2023-04-02 and 2023-04-09 it
> started failing 15-20% on jenkins jobs with seeds that don't reliably
> reproduce.
> At first, this seemed like it might be related to SOLR-16751, but even with
> that fix failures are still happening.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]