> On Dec. 12, 2017, 3:57 a.m., Bill Farner wrote:
> > I should have looked more closely at the last patch, and when preparing my 
> > patch before that.  This test is bound to be flaky since there's nothing 
> > ensuring a happens-before releationship between issuing the request and 
> > checking the result.
> > 
> > I can repro consistently with the patch below, slowing down the 
> > `onThrowable` handler:
> > ```console
> > $ git diff
> > diff --git a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> > b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > index 95e3f52..d4519ee 100644
> > --- a/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > +++ b/src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > @@ -102,6 +102,10 @@ public class Webhook extends AbstractIdleService 
> > implements EventSubscriber {
> >          createRequest(stateChange).execute(new 
> > AsyncCompletionHandler<Integer>() {
> >            @Override
> >            public void onThrowable(Throwable t) {
> > +            try {
> > +              Thread.sleep(1000);
> > +            } catch (InterruptedException e) {
> > +            }
> >              errorsCounter.incrementAndGet();
> >              LOG.error("Error sending a Webhook event", t);
> > ```
> > 
> > If i get some spare time, i may take a stab at a fix; but the 
> > happens-before is what we'll want to seek.
> 
> Jordan Ly wrote:
>     Interesting -- I would imagine this client would force the future to 
> complete.
>     
>     ```
>       /**
>        * Wrap the DefaultAsyncHttpClient for testing so we can complete 
> futures synchronously before
>        * validating assertions. Otherwise, we would have to call 
> `Thread.sleep` in our tests after
>        * each TaskStateChange.
>        */
>       static class WebhookAsyncHttpClientWrapper extends 
> DefaultAsyncHttpClient {
>     
>         WebhookAsyncHttpClientWrapper(DefaultAsyncHttpClientConfig config) {
>           super(config);
>         }
>     
>         @Override
>         public <T> ListenableFuture<T> 
> executeRequest(org.asynchttpclient.Request request,
>                                                       AsyncHandler<T> 
> handler) {
>           ListenableFuture<T> future = super.executeRequest(request, handler);
>           try {
>             future.get();
>             future.done();
>           } catch (InterruptedException | ExecutionException e) {
>             // Ignore, future should not fail to resolve.
>           }
>           return future;
>         }
>       }
>     ```
>     
>     Must be something flawed in the implementation.
> 
> Bill Farner wrote:
>     I re-remembered this code after posting the above reply; and i 
> agree...pretty surprising!
> 
> Bill Farner wrote:
>     This patch fixes things for me; causing the `Thread.sleep()` to no longer 
> affect test results:
>     
>     ```console
>     diff --git 
> a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
>     index 1b5d2d0..cdcf50a 100644
>     --- a/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
>     +++ b/src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
>     @@ -17,7 +17,6 @@ import java.io.IOException;
>      import java.net.URI;
>      import java.util.List;
>      import java.util.Map;
>     -import java.util.concurrent.ExecutionException;
>      import java.util.stream.Collectors;
>     
>      import javax.servlet.ServletException;
>     @@ -107,12 +106,7 @@ public class WebhookTest {
>          public <T> ListenableFuture<T> 
> executeRequest(org.asynchttpclient.Request request,
>                                                        AsyncHandler<T> 
> handler) {
>            ListenableFuture<T> future = super.executeRequest(request, 
> handler);
>     -      try {
>     -        future.get();
>     -        future.done();
>     -      } catch (InterruptedException | ExecutionException e) {
>     -        // Ignore, future should not fail to resolve.
>     -      }
>     +      future.toCompletableFuture().join();
>            return future;
>          }
>        }
>     ```
>     
>     The `future.get()` alone does not work due to this code in 
> `NettyResponseFuture`:
>     ```java
>         public final void abort(final Throwable t) {
>     
>             if (terminateAndExit())
>                 return;
>     
>             future.completeExceptionally(t);
>     
>             if (onThrowableCalledField.compareAndSet(this, 0, 1)) {
>                 try {
>                     asyncHandler.onThrowable(t);
>                 } catch (Throwable te) {
>                     LOGGER.debug("asyncHandler.onThrowable", te);
>                 }
>             }
>         }
>     ```
>     
>     The `future.completeExceptionally(t)` call causes `future.get()` above to 
> throw `ExecutionException`, and our main test thread continues.  Netty's 
> thread separately proceeds to call `asyncHandler.onThrowable(t)`, which races 
> with the main thread.
>     
>     I have not looked into the internals of `CompletableFuture#join()` to say 
> why it differs, but it does pass consistently for me.

Jordan, will you update your PR? Or do you plan to file one, Bill?


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193491
-----------------------------------------------------------


On Dec. 12, 2017, 3:29 a.m., Jordan Ly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 3:29 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
>     https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but opting to never start the Jetty 
> server and forcing a connect exception.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -----
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/1/
> 
> 
> Testing
> -------
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>

Reply via email to