Philip Black-Knight created SOLR-6579: -----------------------------------------
Summary: SnapPuller Replication blocks clean shutdown of tomcat Key: SOLR-6579 URL: https://issues.apache.org/jira/browse/SOLR-6579 Project: Solr Issue Type: Bug Affects Versions: 4.10.1 Reporter: Philip Black-Knight main issue was described in the mailing list her: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser and here: but also including the quotes: original message from Nick {quote} Hello, I have solr 4.10 running on tomcat 7. I'm doing replication from one master to about 10 slaves, with standard configuration: {code} <requestHandler name="/replication" class="solr.ReplicationHandler" > <lst name="master"> <str name="enable">${enable.master:false}</str> <str name="replicateAfter">commit</str> <str name="replicateAfter">startup</str> <str name="confFiles">schema.xml,stopwords.txt</str> <!-- str name="commitReserveDuration">00:00:10</str --> </lst> <lst name="slave"> <str name="enable">${enable.slave:false}</str> <str name="masterUrl">http://master:8080/solr/mycore</str> <str name="pollInterval">00:00:60</str> </lst> </requestHandler> {code} It appears that if tomcat gets shutdown while solr is replicating, it prevents tomcat from shutting down fully. Immediately after receiving the shutdown command, a thread dump is logged into catalina.out (this may have been turned on by some configuration someone else on my team made). I removed some threads that didn't look related, mostly about tomcat session replication, or with names like "http-bio-8080-exec-10". {code} 62252 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [mycore] webapp=/solr path=/replication params={command=details&_=1412014928648&wt=json} status=0 QTime=6 63310 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore – [mycore] webapp=/solr path=/replication params={command=details&_=1412014929699&wt=json} status=0 QTime=6 2014-09-29 14:22:10 Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode): "fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d waiting on condition [0x00007f3c271f0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007e1ff4458> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800 nid=0x203c runnable [0x00007f3c272f1000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80) at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114) at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227) "process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b waiting on condition [0x00007f3c984e9000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007dfbfd890> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039 waiting on condition [0x00007f3cac040000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000007dfbfd890> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000 nid=0x202e waiting on condition [0x00007f3c27af9000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148) at java.lang.Thread.run(Thread.java:745) "ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800 nid=0x202d runnable [0x00007f3c985ea000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60) at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216) at java.lang.Thread.run(Thread.java:745) "http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c runnable [0x00007f3c986eb000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501) at org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312) - locked <0x00000007df0874b0> (a org.apache.tomcat.util.net.SocketWrapper) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000 nid=0x202b waiting on condition [0x00007f3cac1aa000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148) at java.lang.Thread.run(Thread.java:745) "http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000 nid=0x202a runnable [0x00007f3cac2ab000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60) at org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216) at java.lang.Thread.run(Thread.java:745) "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10 tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508) at java.lang.Thread.run(Thread.java:745) "clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in Object.wait() [0x00007f3c27bfa000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007dc6cf7e8> (a org.webmacro.util.Clock$ClockThread) at org.webmacro.util.Clock$ClockThread.run(Clock.java:108) - locked <0x00000007dc6cf7e8> (a org.webmacro.util.Clock$ClockThread) "TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027 waiting on condition [0x00007f3c27cfb000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237) "pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022 waiting on condition [0x00007f3c98246000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070b90b4a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in Object.wait() [0x00007f3c9835d000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007e6030140> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x00000007e6030140> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at sun.java2d.Disposer.run(Disposer.java:145) at java.lang.Thread.run(Thread.java:745) "MySQL Statement Cancellation Timer" daemon prio=10 tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue) at java.lang.Object.wait(Object.java:503) at java.util.TimerThread.mainLoop(Timer.java:526) - locked <0x00000007e5f45ad8> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:505) "Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in Object.wait() [0x00007f3cad0bc000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007e605a8d8> (a java.util.TaskQueue) at java.lang.Object.wait(Object.java:503) at java.util.TimerThread.mainLoop(Timer.java:526) - locked <0x00000007e605a8d8> (a java.util.TaskQueue) at java.util.TimerThread.run(Timer.java:505) "Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in Object.wait() [0x00007f3cac3ac000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x000000070b92b230> (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at org.apache.solr.core.CloserThread.run(CoreContainer.java:905) - locked <0x000000070b92b230> (a java.lang.Object) "snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d waiting on condition [0x00007f3cac555000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070b92b2a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c waiting on condition [0x00007f3cac810000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000070b92b440> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) "GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in Object.wait() [0x00007f3cd8137000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock) at sun.misc.GC$Daemon.run(GC.java:117) - locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock) "Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in Object.wait() [0x00007f3cd8c1b000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x000000070bbee828> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) - locked <0x000000070bbee828> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209) "Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in Object.wait() [0x00007f3cadffe000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:503) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable [0x00007f3ce675c000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.catalina.core.StandardServer.await(StandardServer.java:452) at org.apache.catalina.startup.Catalina.await(Catalina.java:766) at org.apache.catalina.startup.Catalina.start(Catalina.java:712) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456) JNI global references: 422 Heap PSYoungGen total 1184256K, used 1031985K [0x00000007aaa80000, 0x0000000800000000, 0x0000000800000000) eden space 970240K, 94% used [0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000) from space 214016K, 55% used [0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000) to space 212480K, 0% used [0x00000007f3080000,0x00000007f3080000,0x0000000800000000) ParOldGen total 1398272K, used 209681K [0x0000000700000000, 0x0000000755580000, 0x00000007aaa80000) object space 1398272K, 14% used [0x0000000700000000,0x000000070ccc4700,0x0000000755580000) PSPermGen total 78848K, used 78659K [0x00000006c0000000, 0x00000006c4d00000, 0x0000000700000000) object space 78848K, 99% used [0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000) right after this, here comes the shutdown: Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await INFO: A valid shutdown command was received via the shutdown port. Stopping the Server instance. Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["http-bio-8080"] Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["ajp-bio-8009"] Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService stopInternal INFO: Stopping service Catalina Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal INFO: JvmRouteBinderValve stopped Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager stopInternal INFO: Manager [localhost#/dy] expiring sessions upon shutdown (snip some stuff about an unrelated webapp, then the solr errors) 64876 [localhost-startStop-2] INFO org.apache.solr.core.CoreContainer – Shutting down CoreContainer instance=1359815787 64877 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore – [mycore] CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50 64882 [localhost-startStop-2] INFO org.apache.solr.update.UpdateHandler – closing DirectUpdateHandler2{commits=0,autocommit maxTime=15000ms,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0} 64883 [localhost-startStop-2] INFO org.apache.solr.update.SolrCoreState – Closing SolrCoreState 64883 [localhost-startStop-2] INFO org.apache.solr.update.DefaultSolrCoreState – SolrCoreState ref count has reached 0 - closing IndexWriter 64883 [localhost-startStop-2] INFO org.apache.solr.update.DefaultSolrCoreState – closing IndexWriter with IndexWriterCloser 64885 [localhost-startStop-2] ERROR org.apache.solr.update.DefaultSolrCoreState – Error during shutdown of writer. org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700) at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795) at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70) at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371) at org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72) at org.apache.solr.core.SolrCore.close(SolrCore.java:1073) at org.apache.solr.core.SolrCores.close(SolrCores.java:117) at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347) at org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200) at org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315) at org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782) at org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575) at org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K), 0.0427280 secs] 64929 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore – [mycore] Closing main searcher on request. 64935 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – Closing NRTCachingDirectoryFactory - 3 directories currently being tracked 69937 [explicit-fetchindex-cmd] INFO org.apache.solr.update.DefaultSolrCoreState – Creating new IndexWriter... 69938 [explicit-fetchindex-cmd] ERROR org.apache.solr.handler.ReplicationHandler – SnapPull failed :org.apache.lucene.store.AlreadyClosedException: Already closed at org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341) at org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827) at org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337) at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227) 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – looking to close /usr/local/apache-tomcat-7.0.39/solr/mycore/data [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>] 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /usr/local/apache-tomcat-7.0.39/solr/mycore/data 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – looking to close /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>] 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – looking to close /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447 [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>] 69942 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – Closing directory: /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447 69943 [localhost-startStop-2] INFO org.apache.solr.core.CachingDirectoryFactory – Removing directory before core close: /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447 Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager stopInternal INFO: Manager [localhost#/solr] expiring sessions upon shutdown Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads SEVERE: The web application [/solr] appears to have started a thread named [fsyncService-12-thread-1] but has failed to stop it. This is very likely to create a memory leak. Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks SEVERE: The web application [/solr] created a ThreadLocal with key of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["http-bio-8080"] Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["ajp-bio-8009"] Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy INFO: Destroying ProtocolHandler ["http-bio-8080"] Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy INFO: Destroying ProtocolHandler ["ajp-bio-8009"] {code} At this point it looks like tomcat has shutdown, but the process is still running, as checked by ps. If I start a new tomcat and repeat this process a few times, we end up with several zombie instances of tomcat running. Is this designed behavior? What is supposed to happen when the container is shutdown while replication is running? How are others handling this? Please let me know if I can provide any more information. Thanks in advance, Nick {quote} My followup {quote} I was helping to look into this with Nick & I think we may have figured out the core of the problem... The problem is easily reproducible by starting replication on the slave and then sending a shutdown command to tomcat (e.g. catalina.sh stop). With a debugger attached, it looks like the fsyncService thread is blocking VM shutdown because it is created as a non-daemon thread. Essentially what seems to be happening is that the fsyncService thread is running when 'catalina.sh stop' is executed. This goes in and calls SnapPuller.destroy() which aborts the current sync. Around line 517 of the SnapPuller, there is code that is supposed to cleanup the fsyncService thread, but I don't think it is getting executed because the thread that called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends up shutting that down before it can cleanup the fysncService... So... it seems like: {code} if (fsyncService != null) ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService); {code} could be added around line 1706 of SnapPuller.java, or {code} // since otherwise the JVM may kill the thread before any cleanup // code can be called puller.setDaemon(false); {code} could be added around line 230 of ReplicationHandler.java, however this needs some additional work (and I think it might need to be added regardless) since the cleanup code in SnapPuller(around 517) that shuts down the fsync thread never gets execute since logReplicationTimeAndConfFiles() can throw IO exceptions bypassing the rest of the finally block...So the call to {code}logReplicationTimeAndConfFiles(){code} around line 512 would need to get wrapped with a try/catch block to catch the IO exception... {quote} I will attach a patch as well that I believe fixes the issues. Is there a reason to *not* perform a hard shutdown of the fysnc thread? It seems sane enough to me to do a hard shutdown of the fsyncService if the action is aborted... -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org