> On Dec. 12, 2017, 3:57 a.m., Bill Farner wrote: > > I should have looked more closely at the last patch, and when preparing my > > patch before that. This test is bound to be flaky since there's nothing > > ensuring a happens-before releationship between issuing the request and > > checking the result. > > > > I can repro consistently with the patch below, slowing down the > > `onThrowable` handler: > > ```console > > $ git diff > > diff --git a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java > > b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java > > index 95e3f52..d4519ee 100644 > > --- a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java > > +++ b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java > > @@ -102,6 +102,10 @@ public class Webhook extends AbstractIdleService > > implements EventSubscriber { > > createRequest(stateChange).execute(new > > AsyncCompletionHandler<Integer>() { > > @Override > > public void onThrowable(Throwable t) { > > + try { > > + Thread.sleep(1000); > > + } catch (InterruptedException e) { > > + } > > errorsCounter.incrementAndGet(); > > LOG.error("Error sending a Webhook event", t); > > ``` > > > > If i get some spare time, i may take a stab at a fix; but the > > happens-before is what we'll want to seek. > > Jordan Ly wrote: > Interesting -- I would imagine this client would force the future to > complete. > > ``` > /** > * Wrap the DefaultAsyncHttpClient for testing so we can complete > futures synchronously before > * validating assertions. Otherwise, we would have to call > `Thread.sleep` in our tests after > * each TaskStateChange. > */ > static class WebhookAsyncHttpClientWrapper extends > DefaultAsyncHttpClient { > > WebhookAsyncHttpClientWrapper(DefaultAsyncHttpClientConfig config) { > super(config); > } > > @Override > public <T> ListenableFuture<T> > executeRequest(org.asynchttpclient.Request request, > AsyncHandler<T> > handler) { > ListenableFuture<T> future = super.executeRequest(request, handler); > try { > future.get(); > future.done(); > } catch (InterruptedException | ExecutionException e) { > // Ignore, future should not fail to resolve. > } > return future; > } > } > ``` > > Must be something flawed in the implementation. > > Bill Farner wrote: > I re-remembered this code after posting the above reply; and i > agree...pretty surprising! > > Bill Farner wrote: > This patch fixes things for me; causing the `Thread.sleep()` to no longer > affect test results: > > ```console > diff --git > a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > index 1b5d2d0..cdcf50a 100644 > --- a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > +++ b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > @@ -17,7 +17,6 @@ import java.io.IOException; > import java.net.URI; > import java.util.List; > import java.util.Map; > -import java.util.concurrent.ExecutionException; > import java.util.stream.Collectors; > > import javax.servlet.ServletException; > @@ -107,12 +106,7 @@ public class WebhookTest { > public <T> ListenableFuture<T> > executeRequest(org.asynchttpclient.Request request, > AsyncHandler<T> > handler) { > ListenableFuture<T> future = super.executeRequest(request, > handler); > - try { > - future.get(); > - future.done(); > - } catch (InterruptedException | ExecutionException e) { > - // Ignore, future should not fail to resolve. > - } > + future.toCompletableFuture().join(); > return future; > } > } > ``` > > The `future.get()` alone does not work due to this code in > `NettyResponseFuture`: > ```java > public final void abort(final Throwable t) { > > if (terminateAndExit()) > return; > > future.completeExceptionally(t); > > if (onThrowableCalledField.compareAndSet(this, 0, 1)) { > try { > asyncHandler.onThrowable(t); > } catch (Throwable te) { > LOGGER.debug("asyncHandler.onThrowable", te); > } > } > } > ``` > > The `future.completeExceptionally(t)` call causes `future.get()` above to > throw `ExecutionException`, and our main test thread continues. Netty's > thread separately proceeds to call `asyncHandler.onThrowable(t)`, which races > with the main thread. > > I have not looked into the internals of `CompletableFuture#join()` to say > why it differs, but it does pass consistently for me.
Jordan, will you update your PR? Or do you plan to file one, Bill? - Stephan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64523/#review193491 ----------------------------------------------------------- On Dec. 12, 2017, 3:29 a.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64523/ > ----------------------------------------------------------- > > (Updated Dec. 12, 2017, 3:29 a.m.) > > > Review request for Aurora and Stephan Erb. > > > Bugs: AURORA-1961 > https://issues.apache.org/jira/browse/AURORA-1961 > > > Repository: aurora > > > Description > ------- > > Attempt #2 to fix the flaky Webhook test. > > Along the same lines of Stephan's change > (https://reviews.apache.org/r/64482/), but opting to never start the Jetty > server and forcing a connect exception. > > Overall, this flakiness has seemed to increase in volume over the past month. > I've been running into it with a noticable fraction of internal/test builds. > > > Diffs > ----- > > src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java > 1b5d2d02535345edfe6cf04d18d00434393f800b > > > Diff: https://reviews.apache.org/r/64523/diff/1/ > > > Testing > ------- > > This change seems pretty hard to test considering the differences in > environment and the unknown cause of the actual errors. Maybe we run the > reviewbot on this repo repeatedly? Obviously not the most scientifically > sound solution. > > > Thanks, > > Jordan Ly > >