We have analyzed all other request processors exit patterns and could not
find this pattern in any of them.

Found that this has introduced as part of ZOOKEEPER-121.
System.exit and thread.join on same thread is causing this hang.

I've also gone through Ted's earlier response on disk full scenario.
http://www.google.co.in/url?sa=t&source=web&cd=3&ved=0CCAQFjAC&url=http%3A%2
F%2Fmail-archives.apache.org%2Fmod_mbox%2Fzookeeper-user%2F201106.mbox%2F%25
3CBANLkTimzQjXZvDKnP6xQLF9jHfsaz6JstA%40mail.gmail.com%253E&ei=FBQETvPWIcLNr
Qfk75yaDA&usg=AFQjCNFTkguyxTligpz1TZBmkqe9Osz-uw

We feel, even when one of the cluster member's disk is full, we should not
interrupt the complete service.

So, raised a new jira for this issue.
https://issues.apache.org/jira/browse/ZOOKEEPER-1109


-----Original Message-----
From: Laxman [mailto:[email protected]] 
Sent: Wednesday, June 22, 2011 1:54 PM
To: [email protected]
Subject: Zookeeper service is down when Leader disk is full

Hi Everyone,

  

We have found one issue while testing the disk space full scenario. Request
you to validate our observations. Will log an issue if this found to be
valid.

 

Problem: Zookeeper is not shut down completely when dataDir disk space is
full and ZK Cluster went into unserviceable state.
Version: Zookeeper 3.3.3

 

Scenario
If the leader zookeeper disk is made full, the zookeeper is trying to
shutdown. But it is waiting indefinitely while shutting down the
SyncRequestProcessor thread.

Root Cause: this.join() is invoked in the same thread where System.exit(11)
has been triggered.
When disk space full happens, It got the exception as follows 'No space left
on device' and invoked System.exit(11) from the SyncRequestProcessor
thread(The following logs shows the same). Before exiting JVM, ZK will
execute the ShutdownHook of QuorumPeerMain and the flow comes to
SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same
thread where System.exit(11) has been invoked. 



Thread dumps: 

The following thread dump shows the QuorumPeerMain thread is infntely
waiting inside SyncRequestProcessor. 

"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1196) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcess
or.java:171) 
        at
org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(Proposa
lRequestProcessor.java:79) 
        at
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcess
or.java:513) 
        at
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:41
3) 
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) 
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 

"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait()
[0xac9ba000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1196) 
        at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79
) 
        at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) 
        at java.lang.Shutdown.runHooks(Shutdown.java:79) 
        at java.lang.Shutdown.sequence(Shutdown.java:123) 
        at java.lang.Shutdown.exit(Shutdown.java:168) 
        - locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown) 
        at java.lang.Runtime.exit(Runtime.java:90) 
        at java.lang.System.exit(System.java:904) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:149)



Logs :


2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] -
Severe unrecoverable error, exiting 
java.io.IOException: No space left on device 
        at java.io.FileOutputStream.writeBytes(Native Method) 
        at java.io.FileOutputStream.write(FileOutputStream.java:260) 
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

        at
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:30
5) 
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog
.java:324) 
        at
org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.
java:158) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:98) 
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum
server is going for shutdown 
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called 
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown 
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) 
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] -
exception while shutting down acceptor: java.net.SocketException: Socket
closed 
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRequestProcessor@120]
- PrepRequestProcessor exited loop! 
2011-06-21 10:09:59,758 - INFO  [CommitProcessor:2:CommitProcessor@150] -
CommitProcessor exited loop! 
2011-06-21 10:09:59,759 - INFO  [Thread-2:FinalRequestProcessor@379] -
shutdown of request processor complete 
2011-06-21 10:10:00,000 - INFO  [SessionTracker:SessionTrackerImpl@165] -
SessionTrackerImpl exited loop! 





 

--

Thanks
Laxman




Reply via email to