[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411587#comment-15411587
 ] 

Flavio Junqueira commented on ZOOKEEPER-2247:
---------------------------------------------

The patch looks good, it is pretty much ready to go. Since you'll have to 
generate at least the 3.4 patch, I'll ask you to fix a couple of small things, 
hope you don't mind:

I have extended the javadoc of `setState`, would you mind using this one 
instead:

{noformat}
/**
+     * Sets the state of ZooKeeper server. After changing the state, it
+     * notifies the server state change to a registered shutdown handler,
+     * if any.
+     * <p>
+     * The following are the server state transitions:
+     * <li>During startup the server will be in the INITIAL state.</li>
+     * <li>After successfully starting, the server sets the state to
+     * RUNNING.</li>
+     * <li> The server transitions to the ERROR state if it hits an internal 
error.
+     * {@link ZooKeeperServerListenerImpl} notifies any critical resource error
+     * events, e.g., SyncRequestProcessor not being able to write a txn to 
disk.</li>
+     * <li>During shutdown the server sets the state to SHUTDOWN, which
+     * corresponds to the server not running.</li>
+     *
+     * @param state new server state.
+     */
{noformat}

The same for the javadoc of {{ZooKeeperServerShutdownHandler}}:

{noformat}
+/**
+ * ZooKeeper server shutdown handler which will be used to handle ERROR or
+ * SHUTDOWN server state transitions, which in turn releases the associated 
shutdown
+ * latch.
+ */
{noformat}

Finally, in {{waitForNewLeaderElection}}, would you mind setting the 
{{Thread.sleep}} to sleep for only 100ms each time? Not sure if we need to 
increase the counter, but I'd rather reduce the duration of each iteration.

Otherwise, it looks very good, thanks for bearing with all the comments and 
working hard to get it in good shape. If you make these changes and generate 
the branch patches, I'll check this one in.


> Zookeeper service becomes unavailable when leader fails to write transaction 
> log
> --------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2247
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.5.0
>            Reporter: Arshad Mohammad
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.9, 3.5.3, 3.6.0
>
>         Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch, 
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch, 
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch, 
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch, 
> ZOOKEEPER-2247-13.patch, ZOOKEEPER-2247-14.patch, ZOOKEEPER-2247-15.patch, 
> ZOOKEEPER-2247-16.patch, ZOOKEEPER-2247-17.patch, ZOOKEEPER-2247-18.patch, 
> ZOOKEEPER-2247-19.patch, ZOOKEEPER-2247-20.patch, ZOOKEEPER-2247-21.patch, 
> ZOOKEEPER-2247-b3.5.patch, ZOOKEEPER-2247-br-3.4.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction 
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR 
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, 
> from thread : SyncThread:100
> java.io.IOException: Input/output error
>       at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>       at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
>       at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
>       at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
>       at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
>       at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread 
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO  
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO  
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO  
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor 
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO  
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO  [ProcessThread(sid:100 
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non 
> recoverable exception the leader should go down and let other followers 
> become leader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to