Philip Black-Knight created SOLR-6579:
-----------------------------------------

             Summary: SnapPuller Replication blocks clean shutdown of tomcat
                 Key: SOLR-6579
                 URL: https://issues.apache.org/jira/browse/SOLR-6579
             Project: Solr
          Issue Type: Bug
    Affects Versions: 4.10.1
            Reporter: Philip Black-Knight


main issue was described in the mailing list her:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser
and 
here:

but also including the quotes:
original message from Nick
{quote}
Hello,
I have solr 4.10 running on tomcat 7. I'm doing replication from one master
to about 10 slaves, with standard configuration:
{code}
      <requestHandler name="/replication" class="solr.ReplicationHandler" >
           <lst name="master">
             <str name="enable">${enable.master:false}</str>
             <str name="replicateAfter">commit</str>
             <str name="replicateAfter">startup</str>
             <str name="confFiles">schema.xml,stopwords.txt</str>
             <!-- str name="commitReserveDuration">00:00:10</str -->
           </lst>
           <lst name="slave">
             <str name="enable">${enable.slave:false}</str>
             <str name="masterUrl">http://master:8080/solr/mycore</str>
             <str name="pollInterval">00:00:60</str>
           </lst>
      </requestHandler>
{code}
It appears that if tomcat gets shutdown while solr is replicating, it
prevents tomcat from shutting down fully. Immediately after receiving the
shutdown command, a thread dump is logged into catalina.out (this may have
been turned on by some configuration someone else on my team made). I
removed some threads that didn't look related, mostly about tomcat session
replication, or with names like "http-bio-8080-exec-10".

{code}
    62252 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
[mycore] webapp=/solr path=/replication
params={command=details&_=1412014928648&wt=json} status=0 QTime=6
    63310 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
[mycore] webapp=/solr path=/replication
params={command=details&_=1412014929699&wt=json} status=0 QTime=6
    2014-09-29 14:22:10
    Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed
mode):

    "fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d
waiting on condition [0x00007f3c271f0000]
       java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x00000007e1ff4458> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
            at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
            at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800
nid=0x203c runnable [0x00007f3c272f1000]
       java.lang.Thread.State: RUNNABLE
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:152)
            at java.net.SocketInputStream.read(SocketInputStream.java:122)
            at
org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
            at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174)
            at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
            at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
            at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114)
            at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
            at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239)
            at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187)
            at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774)
            at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424)
            at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
            at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)

    "process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b
waiting on condition [0x00007f3c984e9000]
       java.lang.Thread.State: TIMED_WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x00000007dfbfd890> (a
java.util.concurrent.SynchronousQueue$TransferStack)
            at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
            at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
            at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
            at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039
waiting on condition [0x00007f3cac040000]
       java.lang.Thread.State: TIMED_WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x00000007dfbfd890> (a
java.util.concurrent.SynchronousQueue$TransferStack)
            at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
            at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
            at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
            at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000
nid=0x202e waiting on condition [0x00007f3c27af9000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
            at java.lang.Thread.sleep(Native Method)
            at
org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
            at java.lang.Thread.run(Thread.java:745)

    "ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800
nid=0x202d runnable [0x00007f3c985ea000]
       java.lang.Thread.State: RUNNABLE
            at java.net.PlainSocketImpl.socketAccept(Native Method)
            at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
            at java.net.ServerSocket.implAccept(ServerSocket.java:530)
            at java.net.ServerSocket.accept(ServerSocket.java:498)
            at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
            at
org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
            at java.lang.Thread.run(Thread.java:745)

    "http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c
runnable [0x00007f3c986eb000]
       java.lang.Thread.State: RUNNABLE
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:152)
            at java.net.SocketInputStream.read(SocketInputStream.java:122)
            at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516)
            at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501)
            at
org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173)
            at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927)
            at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
            at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
            - locked <0x00000007df0874b0> (a
org.apache.tomcat.util.net.SocketWrapper)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000
nid=0x202b waiting on condition [0x00007f3cac1aa000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
            at java.lang.Thread.sleep(Native Method)
            at
org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
            at java.lang.Thread.run(Thread.java:745)

    "http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000
nid=0x202a runnable [0x00007f3cac2ab000]
       java.lang.Thread.State: RUNNABLE
            at java.net.PlainSocketImpl.socketAccept(Native Method)
            at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
            at java.net.ServerSocket.implAccept(ServerSocket.java:530)
            at java.net.ServerSocket.accept(ServerSocket.java:498)
            at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
            at
org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
            at java.lang.Thread.run(Thread.java:745)

    "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
            at java.lang.Thread.sleep(Native Method)
            at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508)
            at java.lang.Thread.run(Thread.java:745)

    "clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in
Object.wait() [0x00007f3c27bfa000]
       java.lang.Thread.State: TIMED_WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x00000007dc6cf7e8> (a
org.webmacro.util.Clock$ClockThread)
            at org.webmacro.util.Clock$ClockThread.run(Clock.java:108)
            - locked <0x00000007dc6cf7e8> (a
org.webmacro.util.Clock$ClockThread)

    "TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027
waiting on condition [0x00007f3c27cfb000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
            at java.lang.Thread.sleep(Native Method)
            at
org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237)

    "pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022
waiting on condition [0x00007f3c98246000]
       java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x000000070b90b4a0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
            at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
            at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in
Object.wait() [0x00007f3c9835d000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x00000007e6030140> (a
java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
            - locked <0x00000007e6030140> (a
java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
            at sun.java2d.Disposer.run(Disposer.java:145)
            at java.lang.Thread.run(Thread.java:745)

    "MySQL Statement Cancellation Timer" daemon prio=10
tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue)
            at java.lang.Object.wait(Object.java:503)
            at java.util.TimerThread.mainLoop(Timer.java:526)
            - locked <0x00000007e5f45ad8> (a java.util.TaskQueue)
            at java.util.TimerThread.run(Timer.java:505)

    "Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in
Object.wait() [0x00007f3cad0bc000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x00000007e605a8d8> (a java.util.TaskQueue)
            at java.lang.Object.wait(Object.java:503)
            at java.util.TimerThread.mainLoop(Timer.java:526)
            - locked <0x00000007e605a8d8> (a java.util.TaskQueue)
            at java.util.TimerThread.run(Timer.java:505)

    "Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in
Object.wait() [0x00007f3cac3ac000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x000000070b92b230> (a java.lang.Object)
            at java.lang.Object.wait(Object.java:503)
            at org.apache.solr.core.CloserThread.run(CoreContainer.java:905)
            - locked <0x000000070b92b230> (a java.lang.Object)

    "snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d
waiting on condition [0x00007f3cac555000]
       java.lang.Thread.State: TIMED_WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x000000070b92b2a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
            at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
            at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
            at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c
waiting on condition [0x00007f3cac810000]
       java.lang.Thread.State: WAITING (parking)
            at sun.misc.Unsafe.park(Native Method)
            - parking to wait for  <0x000000070b92b440> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
            at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
            at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
            at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
            at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)

    "GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in
Object.wait() [0x00007f3cd8137000]
       java.lang.Thread.State: TIMED_WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
            at sun.misc.GC$Daemon.run(GC.java:117)
            - locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)

    "Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b
runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE

    "C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a
waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE

    "C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009
waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008
waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in
Object.wait() [0x00007f3cd8c1b000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x000000070bbee828> (a
java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
            - locked <0x000000070bbee828> (a
java.lang.ref.ReferenceQueue$Lock)
            at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
            at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

    "Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in
Object.wait() [0x00007f3cadffe000]
       java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on <0x000000070b8b23b8> (a
java.lang.ref.Reference$Lock)
            at java.lang.Object.wait(Object.java:503)
            at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
            - locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable
[0x00007f3ce675c000]
       java.lang.Thread.State: RUNNABLE
            at java.net.PlainSocketImpl.socketAccept(Native Method)
            at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
            at java.net.ServerSocket.implAccept(ServerSocket.java:530)
            at java.net.ServerSocket.accept(ServerSocket.java:498)
            at
org.apache.catalina.core.StandardServer.await(StandardServer.java:452)
            at org.apache.catalina.startup.Catalina.await(Catalina.java:766)
            at org.apache.catalina.startup.Catalina.start(Catalina.java:712)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
            at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456)

    JNI global references: 422

    Heap
     PSYoungGen      total 1184256K, used 1031985K [0x00000007aaa80000,
0x0000000800000000, 0x0000000800000000)
      eden space 970240K, 94% used
[0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000)
      from space 214016K, 55% used
[0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000)
      to   space 212480K, 0% used
[0x00000007f3080000,0x00000007f3080000,0x0000000800000000)
     ParOldGen       total 1398272K, used 209681K [0x0000000700000000,
0x0000000755580000, 0x00000007aaa80000)
      object space 1398272K, 14% used
[0x0000000700000000,0x000000070ccc4700,0x0000000755580000)
     PSPermGen       total 78848K, used 78659K [0x00000006c0000000,
0x00000006c4d00000, 0x0000000700000000)
      object space 78848K, 99% used
[0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000)

right after this, here comes the shutdown:

    Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await
    INFO: A valid shutdown command was received via the shutdown port.
Stopping the Server instance.
    Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
    INFO: Pausing ProtocolHandler ["http-bio-8080"]
    Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
    INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
    Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService
stopInternal
    INFO: Stopping service Catalina
    Sep 29, 2014 2:22:11 PM
org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal
    INFO: JvmRouteBinderValve stopped
    Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager
stopInternal
    INFO: Manager [localhost#/dy] expiring sessions upon shutdown

(snip some stuff about an unrelated webapp, then the solr errors)

    64876 [localhost-startStop-2] INFO  org.apache.solr.core.CoreContainer
 – Shutting down CoreContainer instance=1359815787
    64877 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
[mycore]  CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50
    64882 [localhost-startStop-2] INFO
 org.apache.solr.update.UpdateHandler  – closing
DirectUpdateHandler2{commits=0,autocommit
maxTime=15000ms,autocommits=0,soft
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
    64883 [localhost-startStop-2] INFO
 org.apache.solr.update.SolrCoreState  – Closing SolrCoreState
    64883 [localhost-startStop-2] INFO
 org.apache.solr.update.DefaultSolrCoreState  – SolrCoreState ref count has
reached 0 - closing IndexWriter
    64883 [localhost-startStop-2] INFO
 org.apache.solr.update.DefaultSolrCoreState  – closing IndexWriter with
IndexWriterCloser
    64885 [localhost-startStop-2] ERROR
org.apache.solr.update.DefaultSolrCoreState  – Error during shutdown of
writer.
    org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
            at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700)
            at
org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384)
            at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795)
            at
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70)
            at
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371)
            at
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72)
            at org.apache.solr.core.SolrCore.close(SolrCore.java:1073)
            at org.apache.solr.core.SolrCores.close(SolrCores.java:117)
            at
org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347)
            at
org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200)
            at
org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315)
            at
org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782)
            at
org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565)
            at
org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
            at
org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575)
            at
org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)
    2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K),
0.0427280 secs]
    64929 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
[mycore] Closing main searcher on request.
    64935 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – Closing
NRTCachingDirectoryFactory - 3 directories currently being tracked
    69937 [explicit-fetchindex-cmd] INFO
 org.apache.solr.update.DefaultSolrCoreState  – Creating new IndexWriter...
    69938 [explicit-fetchindex-cmd] ERROR
org.apache.solr.handler.ReplicationHandler  – SnapPull failed
:org.apache.lucene.store.AlreadyClosedException: Already closed
            at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341)
            at
org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827)
            at
org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569)
            at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511)
            at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
            at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)

    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>]
    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data
    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>]
    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>]
    69942 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
    69943 [localhost-startStop-2] INFO
 org.apache.solr.core.CachingDirectoryFactory  – Removing directory before
core close:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
    Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager
stopInternal
    INFO: Manager [localhost#/solr] expiring sessions upon shutdown
    Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
clearReferencesThreads
    SEVERE: The web application [/solr] appears to have started a thread
named [fsyncService-12-thread-1] but has failed to stop it. This is very
likely to create a memory leak.
    Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
    SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads are
going to be renewed over time to try and avoid a probable memory leak.
    Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
    INFO: Stopping ProtocolHandler ["http-bio-8080"]
    Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
    INFO: Stopping ProtocolHandler ["ajp-bio-8009"]
    Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
    INFO: Destroying ProtocolHandler ["http-bio-8080"]
    Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
    INFO: Destroying ProtocolHandler ["ajp-bio-8009"]
{code}
At this point it looks like tomcat has shutdown, but the process is still
running, as checked by ps. If I start a new tomcat and repeat this process
a few times, we end up with several zombie instances of tomcat running.

Is this designed behavior? What is supposed to happen when the container is
shutdown while replication is running? How are others handling this?

Please let me know if I can provide any more information.

Thanks in advance,
Nick
{quote}


My followup
{quote}
I was helping to look into this with Nick & I think we may have figured out the 
core of the problem...

The problem is easily reproducible by starting replication on the slave and 
then sending a shutdown command to tomcat (e.g. catalina.sh stop).

With a debugger attached, it looks like the fsyncService thread is blocking VM 
shutdown because it is created as a non-daemon thread.

Essentially what seems to be happening is that the fsyncService thread is 
running when 'catalina.sh stop' is executed. This goes in and calls 
SnapPuller.destroy() which aborts the current sync. Around line 517 of the 
SnapPuller, there is code that is supposed to cleanup the fsyncService thread, 
but I don't think it is getting executed because the thread that called 
SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends 
up shutting that down before it can cleanup the fysncService...

So... it seems like:
{code}
    if (fsyncService != null) 
ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
{code}
could be added around line 1706 of SnapPuller.java,  or
{code}
           // since otherwise the JVM may kill the thread before any cleanup
           // code can be called
          puller.setDaemon(false);
{code}
could be added around line 230 of ReplicationHandler.java, however this needs 
some additional work (and I think it might need to be added regardless) since 
the cleanup code in SnapPuller(around 517) that shuts down the fsync thread 
never gets execute since logReplicationTimeAndConfFiles() can throw IO 
exceptions bypassing the rest of the finally block...So the call to 
{code}logReplicationTimeAndConfFiles(){code} around line 512 would need to get 
wrapped with a try/catch block to catch the IO exception...
{quote}

I will attach a patch as well that I believe fixes the issues. Is there a 
reason to *not* perform a hard shutdown of the fysnc thread? It seems sane 
enough to me to do a hard shutdown of the fsyncService if the action is 
aborted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to