[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

Raul Gutierrez Segales (JIRA) Sun, 23 Aug 2015 12:48:06 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708503#comment-14708503
 ]


Raul Gutierrez Segales commented on ZOOKEEPER-2247:
---------------------------------------------------

Generally, lgtm.

A few nits:

* In 
src/java/systest/org/apache/zookeeper/test/system/NonRecoverableErrorTests.java,
 testZookeeperSericeShouldBeAvailableEvenAfterNonRecoverableErrorOnLeader could 
probably just be named testZooKeeperServiceAvailableOnLeader (or something 
along that length...)

* In TestUtils, there's a typo in deleteFileRecusively

* also about TestUtils.deleteFileRecursively - if you grep for recursiveDelete 
you'll see that a few tests have their own (repeated) definition as well:

{code}
src/java/systest/org/apache/zookeeper/test/system/QuorumPeerInstance.java
src/java/test/org/apache/zookeeper/test/ClientBase.java
src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java
src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
{code}

* and that these are used across multiple test files... Can we have everyone 
using TestUtils.deleteFileRecursively while we are at it? It would be easier to 
clean it now than to do it in another patch... 

* given that QuorumPeer.setConfigFileName is protected, maybe it's 
easier/cleaner to extend QuorumPeer and add a public setter for configFileName. 
The problem with using reflection if it breaks, we'd only know after running 
CI.. whereas otherwise compilation would fail and we'd know sooner. 



> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> --------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2247
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Critical
>             Fix For: 3.5.2
>
>         Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>       at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>       at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>       at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>       at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2247) Zookeeper service becomes unavailable when leader fails to write transaction log

Reply via email to