[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256098#comment-16256098
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2849:
-------------------------------------------

Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/419#discussion_r151560170
  
    --- Diff: 
src/java/test/org/apache/zookeeper/server/quorum/ExponentialBackoffStrategyTest.java
 ---
    @@ -0,0 +1,180 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.zookeeper.server.quorum;
    +
    +import static org.junit.Assert.assertEquals;
    +import static org.junit.Assert.assertNotEquals;
    +import static org.junit.Assert.assertTrue;
    +
    +import org.junit.Test;
    +
    +/**
    + * Unit tests for {@link ExponentialBackoffStrategy}.
    + */
    +public class ExponentialBackoffStrategyTest {
    --- End diff --
    
    This should extend ZKTestCase


> Quorum port binding needs exponential back-off retry
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-2849
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2849
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: quorum
>    Affects Versions: 3.4.6, 3.5.3
>            Reporter: Brian Lininger
>            Assignee: Brian Lininger
>            Priority: Minor
>
> Recently we upgraded the AWS instance type we use for running out ZooKeeper 
> nodes, and by doing so we're intermittently hitting an issue where ZooKeeper 
> cannot bind to the server election port because the IP is incorrect.  This is 
> due to name resolution in Route53 not being in sync when ZooKeeper starts on 
> the more powerful EC2 instances.  Currently in QuorumCnxManager.Listener, we 
> only attempt to bind 3 times with a 1s sleep between retries, which is not 
> long enough.  
> I'm proposing to change this to follow an exponential back-off type strategy 
> where each failed attempt causes a longer sleep between retry attempts.  This 
> would allow for Zookeeper to gracefully recover when the host is 
> misconfigured, and subsequently corrected, without requiring the process to 
> be restarted while also minimizing the impact to the running instance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to