[ 
https://issues.apache.org/jira/browse/SOLR-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714353#comment-17714353
 ] 

Chris M. Hostetter commented on SOLR-16753:
-------------------------------------------

FWIW, I found that on my machine, a combination of {{stress --cpu 8}} 
concurrently with {{./gradlew -p solr/core beast -Dtests.failfast=true 
-Dtests.dups=20 --tests 
SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull 
-Dtests.seed=7DDC80A84C7DDB0E -Dtests.locale=gsw-CH 
-Dtests.timezone=America/Adak -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8}} would prettyreliably cause a failure against 
{{main/HEAD}} (currently {{b2f7f4ddb48c7086ad639e5b263d17fd4335ec19}} ... but 
did not fail against {{c9d656a8aa7bc2f711d4a9007fc590faa9853fcf}}

So i did some manual {{git bisect}} 'ing and AFAICT the commit that introduced 
the problem was in fact what I initially flagged in  SOLR-16751:

{noformat}
2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d is the first bad commit
commit 2ac7ed29563a33d9f9a31737996a1d4cfb0fca0d
Author: Noble Paul <noble.p...@gmail.com>
Date:   Wed Apr 12 22:07:03 2023 +1000

    Avoid unnecessary map creation while serializing DocCollection

:040000 040000 d8b81cae08b995c464c98b9a18496c9b5f5b81b3 
6d03a51dbc4f8b183059195ecc50bdea7dc1da6e M      solr
{noformat}

Reviewing the change, again, I don't have a concrete explanation for _why_ this 
commit introduced the problem, but beasting the test before and after it seems 
pretty conclusive.

My best theory: 
* Somewhere in the code we have a (pre-existing) situation where  
{{DocCollection.write(JSONWriter)}} can be called concurrently with modifying 
the properties of the {{DocCollection}}.
* Prior to this commit, this concurrent issue was not (as much of) a problem, 
because the very first thing that {{DocCollection.write(JSONWriter)}} did was 
duplicate Map
* After this commit, there is a longer window of time during the 
{{DocCollection.write(JSONWriter)}} call, where concurrent maniplations of the 
{{DocCollection.getProperties()}} will impact the data being written by the 
{{DocCollection.write(JSONWriter)}} call in a way that may cause 
inconsistencies.

..but this is purely a theory.

> SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull failures
> -----------------------------------------------------------------------
>
>                 Key: SOLR-16753
>                 URL: https://issues.apache.org/jira/browse/SOLR-16753
>             Project: Solr
>          Issue Type: Test
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Noble Paul
>            Priority: Major
>         Attachments: SOLR-16753.txt
>
>
> {{SplitShardWithNodeRoleTest.testSolrClusterWithNodeRoleWithPull}} – was 
> added on 2023-03-13, but somwhere between 2023-04-02 and 2023-04-09 it 
> started failing 15-20% on jenkins jobs with seeds that don't reliably 
> reproduce.
> At first, this seemed like it might be related to SOLR-16751, but even with 
> that fix failures are still happening.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to