[ 
https://issues.apache.org/jira/browse/SOLR-16848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17735796#comment-17735796
 ] 

Alex Deparvu commented on SOLR-16848:
-------------------------------------

I think I have an idea of why this happens, but unfortunately I was 
unsuccessful in reproducing this, even on docker with weak resources.
will post  a suggestion for review based on code review and we'll see if this 
helps or not.

basically what I am seeing is a race inside the initialization method of the 
JettySolrRunner#lifeCycleStarted 
(https://github.com/apache/solr/blob/3e10f8b8901751de78e3c5b93538be133b6336ff/solr/test-framework/src/java/org/apache/solr/embedded/JettySolrRunner.java#LL394C38-L394C54):
 - on one hand: CoreContainerProvider will start init (line 407) and eventually 
(and async) will call into the test's `testing_beforeRegisterInZk` callback 
where the callback method will attempt to get the CoreContainer. trouble is the 
getCoreContainer method relies on 'dispatchFilter' being initialized (see 
below).
- on the other the dispatchFilter will be init after (on line 423).
my theory is: because the CoreContainerProvider is async it can call into the 
callback before the dispatchFilter is there. so my proposal is to add a wait 
for 'dispatchFilter != null' inside the callback only (this change will only 
affect the code itself, nothing else).
I verified this reproduces by introducing an artificial wait between line 407 
and line 423 and the NPE is present consistently, the wait-fix will provide the 
opportunity for the init to catchup to the callback if needed. if this 
precondition is not met, the flow should not progress anyway.






> Flaky DeleteReplicaTest.raceConditionOnDeleteAndRegisterReplica
> ---------------------------------------------------------------
>
>                 Key: SOLR-16848
>                 URL: https://issues.apache.org/jira/browse/SOLR-16848
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Alex Deparvu
>            Priority: Minor
>
> Some stats first:
> - Past 7 days trend:
> {noformat}
> Class: org.apache.solr.cloud.DeleteReplicaTest
> Method: raceConditionOnDeleteAndRegisterReplica
> Failures: 11.22% (44 / 392)
> {noformat}
> - Test failure is caused by a NullPointerException:
> {noformat}
> ERROR (coreZkRegister-772-thread-1-processing-127.0.0.1:40471_solr) 
> [n:127.0.0.1:40471_solr c:raceDeleteReplicaCollection s:shard1 r:core_node4 
> x:raceDeleteReplicaCollection_shard1_replica_n2] o.a.s.c.DeleteReplicaTest 
> Failed to delete replica
>  => java.lang.NullPointerException: Cannot invoke 
> "org.apache.solr.core.CoreContainer.getZkController()" because the return 
> value of "org.apache.solr.embedded.JettySolrRunner.getCoreContainer()" is null
> {noformat}
> a more complete trace 
> {noformat}
> o.a.s.c.DeleteReplicaTest Failed to delete replica
>   2>           => java.lang.NullPointerException
>   2>    at 
> org.apache.solr.cloud.DeleteReplicaTest.lambda$raceConditionOnDeleteAndRegisterReplica$10(DeleteReplicaTest.java:350)
>   2> java.lang.NullPointerException: null
>   2>    at 
> org.apache.solr.cloud.DeleteReplicaTest.lambda$raceConditionOnDeleteAndRegisterReplica$10(DeleteReplicaTest.java:350)
>   2>    at 
> org.apache.solr.core.ZkContainer.lambda$registerInZk$1(ZkContainer.java:211) 
> [main/:?]
>   2>    at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:289)
>   2>    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   2>    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   2>    at java.lang.Thread.run(Thread.java:829) [?:?]
> o.a.s.c.u.ExecutorUtil Uncaught exception java.lang.AssertionError: Failed to 
> delete replica thrown by thread: 
> coreZkRegister-1586-thread-1-processing-127.0.0.1:34497_solr
>   2>           => java.lang.Exception: Submitter stack trace
>   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:256)
> java.lang.Exception: Submitter stack trace
>   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:256)
>   2>  at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:244) 
> ~[main/:?]
>   2>  at 
> org.apache.solr.core.CoreContainer.lambda$loadInternal$12(CoreContainer.java:1025)
>   2>  at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:234)
>   2>  at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:289)
>   2>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>   2>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>   2>  at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.lang.Exception: Submitter stack trace
>   2>  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:256)
>   2>  at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:140)
>  ~[?:?]
>   2>  at 
> com.codahale.metrics.InstrumentedExecutorService.submit(InstrumentedExecutorService.java:122)
>   2>  at 
> org.apache.solr.core.CoreContainer.loadInternal(CoreContainer.java:1009) 
> ~[main/:?]
>   2>  at org.apache.solr.core.CoreContainer.load(CoreContainer.java:753)
>   2>  at 
> org.apache.solr.servlet.CoreContainerProvider.createCoreContainer(CoreContainerProvider.java:411)
>   2>  at 
> org.apache.solr.servlet.CoreContainerProvider.init(CoreContainerProvider.java:230)
>   2>  at 
> org.apache.solr.embedded.JettySolrRunner$1.lifeCycleStarted(JettySolrRunner.java:407)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to