[
https://issues.apache.org/jira/browse/SOLR-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714353#comment-17714353
]
Chris M. Hostetter commented on SOLR-16753:
-------------------------------------------
FWIW, I found that on my machine, a combination of {{stress --cpu 8}}
concurrently with {{./gradlew -p solr/core beast -Dtests.failfast=true
-Dtests.dups=20 --tests
SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull
-Dtests.seed=7DDC80A84C7DDB0E -Dtests.locale=gsw-CH
-Dtests.timezone=America/Adak -Dtests.asserts=true
-Dtests.file.encoding=UTF-8}} would prettyreliably cause a failure against
{{main/HEAD}} (currently {{b2f7f4ddb48c7086ad639e5b263d17fd4335ec19}} ... but
did not fail against {{c9d656a8aa7bc2f711d4a9007fc590faa9853fcf}}
So i did some manual {{git bisect}} 'ing and AFAICT the commit that introduced
the problem was in fact what I initially flagged in SOLR-16751:
{noformat}
2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d is the first bad commit
commit 2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d
Author: Noble Paul <[email protected]>
Date: Wed Apr 12 22:07:03 2023 +1000
Avoid unnecessary map creation while serializing DocCollection
:040000 040000 d8b81cae08b995c464c98b9a18496c9b5f5b81b3
6d03a51dbc4f8b183059195ecc50bdea7dc1da6e M solr
{noformat}
Reviewing the change, again, I don't have a concrete explanation for _why_ this
commit introduced the problem, but beasting the test before and after it seems
pretty conclusive.
My best theory:
* Somewhere in the code we have a (pre-existing) situation where
{{DocCollection.write(JSONWriter)}} can be called concurrently with modifying
the properties of the {{DocCollection}}.
* Prior to this commit, this concurrent issue was not (as much of) a problem,
because the very first thing that {{DocCollection.write(JSONWriter)}} did was
duplicate Map
* After this commit, there is a longer window of time during the
{{DocCollection.write(JSONWriter)}} call, where concurrent maniplations of the
{{DocCollection.getProperties()}} will impact the data being written by the
{{DocCollection.write(JSONWriter)}} call in a way that may cause
inconsistencies.
..but this is purely a theory.
> SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull failures
> -----------------------------------------------------------------------
>
> Key: SOLR-16753
> URL: https://issues.apache.org/jira/browse/SOLR-16753
> Project: Solr
> Issue Type: Test
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Assignee: Noble Paul
> Priority: Major
> Attachments: SOLR-16753.txt
>
>
> {{SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull}} – was
> added on 2023-03-13, but somwhere between 2023-04-02 and 2023-04-09 it
> started failing 15-20% on jenkins jobs with seeds that don't reliably
> reproduce.
> At first, this seemed like it might be related to SOLR-16751, but even with
> that fix failures are still happening.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]