[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113849#comment-15113849
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
---------------------------------------------

I can't convince myself that this is the right way to fix this issue. In fact, 
the critical thread addition might not have been done in the way I'd expect. If 
you check the run method in QuorumPeer, around line 1093, I'd expect that we 
deal with the problem reported here by simply returning from 
{{observeLeader()}}, {{followLeader()}}, and {{lead()}}. Instead, it sounds 
like the {{handleException()}} call is taking a different call path and the 
finally blocks for {{observeLeader()}}, {{followLeader()}}, and {{lead()}} 
aren't being executed, which would have avoided this present issue in the first 
place. Could anyone explain to me why we aren't simply relying on the finally 
blocks? If we can do it, I'd much rather have this option implemented rather 
than multiple code paths that change the state of the server.

> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> --------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2247
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Critical
>             Fix For: 3.4.8, 3.5.2
>
>         Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>       at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>       at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>       at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>       at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to