[ https://issues.apache.org/jira/browse/SOLR-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151552#comment-16151552 ]
Erick Erickson commented on SOLR-11278: --------------------------------------- I just ran the patch against 7x with two different modes: 1> the original patch x 100 2> removed the three second wait in the test case x 250 <1> had no errors <2> had one error, not the same one and I haven't pursued it yet, excerpt below. I have the full test case. I'm going to put the 3 second wait back in and try 1,000 iterations to see if this error occurs again. NOTE: I don't think then sleep is something we _want_ to leave in the code, just seeing if it alters the results for a clue where to look next. This is good progress! Oh, I haven't reviewed the patch in detail yet either, just trying to get a sense of what the behavior is before diving in. [junit4] 2> 49759 INFO (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.SolrCore [cdcr-target_shard1_replica_n1] CLOSING SolrCore org.apache.solr.core.SolrCore@2fdc6dad [junit4] 2> 49759 INFO (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.core.cdcr-target.shard1.replica_n1, tag=802975149 [junit4] 2> 49759 INFO (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@7b017431: rootName = solr_59962, domain = solr.core.cdcr-target.shard1.replica_n1, service url = null, agent id = null] for registry solr.core.cdcr-target.shard1.replica_n1 / com.codahale.metrics.MetricRegistry@67a56b63 [junit4] 2> 49760 INFO (searcherExecutor-150-thread-1-processing-n:127.0.0.1:59962_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.SolrCore [cdcr-target_shard1_replica_n1] Registered new searcher Searcher@51a42485[cdcr-target_shard1_replica_n1] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_k(7.1.0):C1900) Uninverting(_l(7.1.0):C100)))} [junit4] 2> 49774 INFO (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.collection.cdcr-target.shard1.leader, tag=802975149 [junit4] 2> 49775 INFO (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Solr core is being closed - shutting down CDCR handler @ cdcr-target:shard1 [junit4] 2> 62525 ERROR (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.CachingDirectoryFactory Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/Users/Erick/apache/solrJiras/beast/results/beast-tmp/203/J0/temp/solr.cloud.CdcrBootstrapTest_DCBEC103DFB44964-001/cdcr-target-003/node1/./cdcr-target_shard1_replica_n1/data/index.20170902102657976;done=false>> [junit4] 2> 62526 ERROR (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/Users/Erick/apache/solrJiras/beast/results/beast-tmp/203/J0/temp/solr.cloud.CdcrBootstrapTest_DCBEC103DFB44964-001/cdcr-target-003/node1/./cdcr-target_shard1_replica_n1/data/index.20170902102657976;done=false>> [junit4] 2> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:178) [junit4] 2> at org.apache.solr.core.SolrCore.close(SolrCore.java:1613) [junit4] 2> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:859) [junit4] 2> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1232) [junit4] 2> at org.apache.solr.handler.IndexFetcher.lambda$reloadCore$0(IndexFetcher.java:900) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> [junit4] 2> 75243 ERROR (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.CachingDirectoryFactory Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/Users/Erick/apache/solrJiras/beast/results/beast-tmp/203/J0/temp/solr.cloud.CdcrBootstrapTest_DCBEC103DFB44964-001/cdcr-target-003/node1/./cdcr-target_shard1_replica_n1/data/index;done=false>> [junit4] 2> 75243 ERROR (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.CachingDirectoryFactory Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<<refCount=1;path=/Users/Erick/apache/solrJiras/beast/results/beast-tmp/203/J0/temp/solr.cloud.CdcrBootstrapTest_DCBEC103DFB44964-001/cdcr-target-003/node1/./cdcr-target_shard1_replica_n1/data/index;done=false>> [junit4] 2> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:178) [junit4] 2> at org.apache.solr.core.SolrCore.close(SolrCore.java:1613) [junit4] 2> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:859) [junit4] 2> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1232) [junit4] 2> at org.apache.solr.handler.IndexFetcher.lambda$reloadCore$0(IndexFetcher.java:900) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> [junit4] 2> 75244 ERROR (Thread-83) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.c.SolrCore java.lang.AssertionError: 1 [junit4] 2> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:192) [junit4] 2> at org.apache.solr.core.SolrCore.close(SolrCore.java:1613) [junit4] 2> at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:859) [junit4] 2> at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1232) [junit4] 2> at org.apache.solr.handler.IndexFetcher.lambda$reloadCore$0(IndexFetcher.java:900) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> [junit4] 2> 75245 ERROR (recoveryExecutor-81-thread-1-processing-n:127.0.0.1:59962_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Index fetch failed : [junit4] 2> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:655) [junit4] 2> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:332) [junit4] 2> at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:419) [junit4] 2> at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:773) [junit4] 2> at org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:724) [junit4] 2> at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) [junit4] 2> at java.util.concurrent.FutureTask.run(FutureTask.java:266) [junit4] 2> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [junit4] 2> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [junit4] 2> at java.lang.Thread.run(Thread.java:745) [junit4] 2> Caused by: java.lang.NullPointerException [junit4] 2> at org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:888) [junit4] 2> at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:632) [junit4] 2> ... 10 more [junit4] 2> [junit4] 2> 75245 INFO (recoveryExecutor-81-thread-1-processing-n:127.0.0.1:59962_solr x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) [n:127.0.0.1:59962_solr c:cdcr-target s:shard1 r:core_node2 x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler boostrap cal > CdcrBootstrapTest failing intermittently > ---------------------------------------- > > Key: SOLR-11278 > URL: https://issues.apache.org/jira/browse/SOLR-11278 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: CDCR > Affects Versions: 7.0, 6.6.1 > Reporter: Amrit Sarkar > Assignee: Varun Thacker > Priority: Critical > Labels: test > Attachments: master-bs.patch, > SOLR-11278-cancel-bootstrap-on-stop.patch, SOLR-11278.patch, test_results > > > {{CdcrBootstrapTest}} is failing while running beasts for significant > iterations. > The bootstrapping is failing in the test, after the first batch is indexed > for each {{testmethod}}, which results in documents mismatch :: > {code} > [beaster] 2> 39167 ERROR > (updateExecutor-39-thread-1-processing-n:127.0.0.1:42155_solr > x:cdcr-target_shard1_replica_n1 s:shard1 c:cdcr-target r:core_node2) > [n:127.0.0.1:42155_solr c:cdcr-target s:shard1 r:core_node2 > x:cdcr-target_shard1_replica_n1] o.a.s.h.CdcrRequestHandler Bootstrap > operation failed > [beaster] 2> java.util.concurrent.ExecutionException: > java.lang.AssertionError > [beaster] 2> at > java.util.concurrent.FutureTask.report(FutureTask.java:122) > [beaster] 2> at > java.util.concurrent.FutureTask.get(FutureTask.java:192) > [beaster] 2> at > org.apache.solr.handler.CdcrRequestHandler.lambda$handleBootstrapAction$0(CdcrRequestHandler.java:654) > [beaster] 2> at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > [beaster] 2> at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [beaster] 2> at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > [beaster] 2> at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188) > [beaster] 2> at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [beaster] 2> at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [beaster] 2> at java.lang.Thread.run(Thread.java:748) > [beaster] 2> Caused by: java.lang.AssertionError > [beaster] 2> at > org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:813) > [beaster] 2> at > org.apache.solr.handler.CdcrRequestHandler$BootstrapCallable.call(CdcrRequestHandler.java:724) > [beaster] 2> at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197) > [beaster] 2> ... 5 more > {code} > {code} > [beaster] [01:37:16.282] FAILURE 153s | > CdcrBootstrapTest.testBootstrapWithSourceCluster <<< > [beaster] > Throwable #1: java.lang.AssertionError: Document mismatch on > target after sync expected:<2000> but was:<1000> > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org