[ https://issues.apache.org/jira/browse/ZOOKEEPER-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256098#comment-16256098 ]
ASF GitHub Bot commented on ZOOKEEPER-2849: ------------------------------------------- Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/419#discussion_r151560170 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/ExponentialBackoffStrategyTest.java --- @@ -0,0 +1,180 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.zookeeper.server.quorum; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; +import static org.junit.Assert.assertTrue; + +import org.junit.Test; + +/** + * Unit tests for {@link ExponentialBackoffStrategy}. + */ +public class ExponentialBackoffStrategyTest { --- End diff -- This should extend ZKTestCase > Quorum port binding needs exponential back-off retry > ---------------------------------------------------- > > Key: ZOOKEEPER-2849 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2849 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum > Affects Versions: 3.4.6, 3.5.3 > Reporter: Brian Lininger > Assignee: Brian Lininger > Priority: Minor > > Recently we upgraded the AWS instance type we use for running out ZooKeeper > nodes, and by doing so we're intermittently hitting an issue where ZooKeeper > cannot bind to the server election port because the IP is incorrect. This is > due to name resolution in Route53 not being in sync when ZooKeeper starts on > the more powerful EC2 instances. Currently in QuorumCnxManager.Listener, we > only attempt to bind 3 times with a 1s sleep between retries, which is not > long enough. > I'm proposing to change this to follow an exponential back-off type strategy > where each failed attempt causes a longer sleep between retry attempts. This > would allow for Zookeeper to gracefully recover when the host is > misconfigured, and subsequently corrected, without requiring the process to > be restarted while also minimizing the impact to the running instance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)