[
https://issues.apache.org/jira/browse/GEODE-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238865#comment-17238865
]
ASF GitHub Bot commented on GEODE-8623:
---------------------------------------
jinmeiliao commented on a change in pull request #5743:
URL: https://github.com/apache/geode/pull/5743#discussion_r530576020
##########
File path: geode-common/src/main/java/org/apache/geode/internal/Retry.java
##########
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
contributor license
+ * agreements. See the NOTICE file distributed with this work for additional
information regarding
+ * copyright ownership. The ASF licenses this file to You under the Apache
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express
+ * or implied. See the License for the specific language governing permissions
and limitations under
+ * the License.
+ */
+package org.apache.geode.internal;
+
+import static java.util.concurrent.TimeUnit.NANOSECONDS;
+
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.function.Predicate;
+import java.util.function.Supplier;
+
+import org.apache.geode.annotations.VisibleForTesting;
+
+/**
+ * Utility class for retrying operations.
+ */
+public class Retry {
+
+ interface Timer {
+ long nanoTime();
+
+ void sleep(long sleepTimeInNano) throws InterruptedException;
+ }
+
+ static class SteadyTimer implements Timer {
+ @Override
+ public long nanoTime() {
+ return System.nanoTime();
+ }
+
+ @Override
+ public void sleep(long sleepTimeInNano) throws InterruptedException {
+ long millis = NANOSECONDS.toMillis(sleepTimeInNano);
+ // avoid throwing IllegalArgumentException
+ if (millis > 0) {
+ Thread.sleep(millis);
+ }
+ }
+ }
+
+ private static final SteadyTimer steadyClock = new SteadyTimer();
+
+ /**
+ * Try the supplier function until the predicate is true or timeout occurs.
+ *
+ * @param timeout to retry for
+ * @param timeoutUnit the unit for timeout
+ * @param interval time between each try
+ * @param intervalUnit the unit for interval
+ * @param supplier to execute until predicate is true or times out
+ * @param predicate to test for retry
+ * @param <T> type of return value
+ * @return value from supplier after it passes predicate or times out.
+ */
+ public static <T> T tryFor(long timeout, TimeUnit timeoutUnit,
+ long interval, TimeUnit intervalUnit,
+ Supplier<T> supplier,
+ Predicate<T> predicate) throws TimeoutException, InterruptedException {
+ return tryFor(timeout, timeoutUnit, interval, intervalUnit, supplier,
predicate, steadyClock);
+ }
+
+ @VisibleForTesting
+ static <T> T tryFor(long timeout, TimeUnit timeoutUnit,
+ long interval, TimeUnit intervalUnit,
+ Supplier<T> supplier,
+ Predicate<T> predicate,
+ Timer timer) throws TimeoutException, InterruptedException {
+ long until = timer.nanoTime() + NANOSECONDS.convert(timeout, timeoutUnit);
+ long intervalNano = NANOSECONDS.convert(interval, intervalUnit);
+
+ T value;
+ for (;;) {
+ value = supplier.get();
+ if (predicate.test(value)) {
+ return value;
+ } else {
+ // if there is still more time left after we sleep for interval
period, then sleep and retry
+ // otherwise break out and throw TimeoutException
+ if ((timer.nanoTime() + intervalNano) < until) {
Review comment:
> In cases where the predicate returns `false` forever, our final
attempt is made as close to the timeout as possible (given the limitations of
Java thread scheduling). This is exactly what a user expects.
In your proposal, at your final attempt, you calculated what's the time left
till timeout, then you sleep till timeout. Our differences lie at this point
afterwards, I think that since we are already at timeout point we should throw
exception, but you think we should try again. I don't think "trying again" is
"exactly what a user expects", since I certainly didn't expect that.
Let's just list out the choices here and decide what's the "most" reasonable
thing to do. Say our interval is 3 seconds, and there are only 2 seconds left
after our last call to the supplier and the predicate is still false, At this
point, we can do either of the following:
1. throw timeout immediately (my implementation)
2. sleep for 2 seconds till timeout and throw timeout (since 2 is what's
left till timeout)
3. sleep for 3 seconds and throw timeout (since 3 is what user has specified
as interval)
4. sleep for 2 seconds till timeout and then try again (your suggestion)
In the above 4, my opinion is 4 is the least reasonable, and 1 has some
optimization among the first 3.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Timing between DNS and Geode startup can result in permanent unknown host
> exceptions.
> -------------------------------------------------------------------------------------
>
> Key: GEODE-8623
> URL: https://issues.apache.org/jira/browse/GEODE-8623
> Project: Geode
> Issue Type: Bug
> Affects Versions: 1.9.0, 1.9.1, 1.10.0, 1.9.2, 1.11.0, 1.12.0, 1.13.0,
> 1.14.0, 1.13.1
> Reporter: Jacob Barrett
> Priority: Minor
> Labels: pull-request-available
>
> In a managed environment were local host name DNS entries and the startup of
> Geode happen concurrently it is possible for Geode to fail name resolution in
> the local hostname caching. If it fails to resolve the local hostname when
> loading the caching utility class then any service dependent on this name
> will fail without chance for recovery.
> {code}
> [error 2020/09/30 19:50:21.644 UTC <main> tid=0x1] Jmx manager could not be
> started because java.net.UnknownHostException
> org.apache.geode.management.ManagementException: java.net.UnknownHostException
> at
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133)
> at
> org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432)
> at
> org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181)
> at
> org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2063)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1239)
> at
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:219)
> at
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:171)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> at
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:887)
> at
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:803)
> at
> org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:732)
> at
> org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:251)
> Caused by: java.net.UnknownHostException
> at
> org.apache.geode.internal.net.SocketCreator.getLocalHost(SocketCreator.java:285)
> at
> org.apache.geode.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:310)
> at
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:131)
> ... 14 more
> [error 2020/09/30 19:50:21.724 UTC <main> tid=0x1]
> org.apache.geode.management.ManagementException: java.net.UnknownHostException
> Exception in thread "main" org.apache.geode.management.ManagementException:
> java.net.UnknownHostException
> at
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:133)
> at
> org.apache.geode.management.internal.SystemManagementService.startManager(SystemManagementService.java:432)
> at
> org.apache.geode.management.internal.beans.ManagementAdapter.handleCacheCreation(ManagementAdapter.java:181)
> at
> org.apache.geode.management.internal.beans.ManagementListener.handleEvent(ManagementListener.java:127)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.notifyResourceEventListeners(InternalDistributedSystem.java:2063)
> at
> org.apache.geode.distributed.internal.InternalDistributedSystem.handleResourceEvent(InternalDistributedSystem.java:606)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1239)
> at
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:219)
> at
> org.apache.geode.internal.cache.InternalCacheBuilder.create(InternalCacheBuilder.java:171)
> at org.apache.geode.cache.CacheFactory.create(CacheFactory.java:142)
> at
> org.apache.geode.distributed.internal.DefaultServerLauncherCacheProvider.createCache(DefaultServerLauncherCacheProvider.java:52)
> at
> org.apache.geode.distributed.ServerLauncher.createCache(ServerLauncher.java:887)
> at
> org.apache.geode.distributed.ServerLauncher.start(ServerLauncher.java:803)
> at
> org.apache.geode.distributed.ServerLauncher.run(ServerLauncher.java:732)
> at
> org.apache.geode.distributed.ServerLauncher.main(ServerLauncher.java:251)
> Caused by: java.net.UnknownHostException
> at
> org.apache.geode.internal.net.SocketCreator.getLocalHost(SocketCreator.java:285)
> at
> org.apache.geode.management.internal.ManagementAgent.configureAndStart(ManagementAgent.java:310)
> at
> org.apache.geode.management.internal.ManagementAgent.startAgent(ManagementAgent.java:131)
> ... 14 more
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)