[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131246#comment-15131246
 ] 

Chris Nauroth commented on ZOOKEEPER-2247:
------------------------------------------

[~rakeshr], patch v12 looks good to me.  I just have one comment.

{code}
            // Watch status of ZooKeeper server. If there is an internal error
            // then will do a graceful shutdown.
            while (zkServer.isStateRunning()) {
                try {
                    Thread.sleep(1000); // watch interval
                } catch (InterruptedException ie) {
                    LOG.info("Thread interrupted");
                }
            }
{code}

It's generally an anti-pattern to swallow {{InterruptedException}}, even though 
there is a lot of existing code in ZooKeeper and other codebases that does it.  
In this specific case, it would clear the interrupted status, and then that 
could potentially impact later code like {{ServerCnxnFactory#join}} that calls 
interruptable methods.  Let's restore interrupted status in the catch block by 
calling {{Thread.currentThread().interrupt()}}.

I'll be +1 after that change.  Thanks again for your diligence on this one!


> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> --------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2247
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Arshad Mohammad
>            Assignee: Arshad Mohammad
>            Priority: Critical
>             Fix For: 3.4.9, 3.5.2, 3.6.0
>
>         Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>       at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>       at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>       at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>       at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to