mateczagany commented on code in PR #27407:
URL: https://github.com/apache/flink/pull/27407#discussion_r2693277077
##########
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractRestHandler.java:
##########
@@ -85,14 +85,15 @@ protected CompletableFuture<Void> respondToRequest(
response = FutureUtils.completedExceptionally(e);
}
- return response.thenAccept(
+ return response.thenAcceptAsync(
resp ->
HandlerUtils.sendResponse(
ctx,
httpRequest,
resp,
messageHeaders.getResponseStatusCode(),
- responseHeaders));
+ responseHeaders),
+ ctx.executor());
Review Comment:
<del>@snuyanzin the issue was this stuck thread:</del>
```
Jan 14 21:15:14 at
org.apache.flink.core.testutils.BlockerSync.awaitBlocker(BlockerSync.java:59)
Jan 14 21:15:14 - locked <0x00000000aa141c48> (a java.lang.Object)
Jan 14 21:15:14 at
org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:597)
```
<del>Something has changed internally in Netty, and this test consistently
times out on Azure CI. The issue arises when the response is not written from
the event loop, and Netty enqueues the write operations on the event loop
instead, and then we close the REST server itself (which is the point of the
test), the writes don't always go through properly.</del>
<del>By running the write operations on the event loop, this issue goes
away. I think it's better to do it this way anyways, so that Netty does not
have to create 3 write tasks to eventually execute on the event loop
thread.</del>
Nevermind, unfortunately this only solved the issue on my side own computer,
but the CI still failed at the exact same steps. I'll continue digging...
##########
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractRestHandler.java:
##########
@@ -85,14 +85,15 @@ protected CompletableFuture<Void> respondToRequest(
response = FutureUtils.completedExceptionally(e);
}
- return response.thenAccept(
+ return response.thenAcceptAsync(
resp ->
HandlerUtils.sendResponse(
ctx,
httpRequest,
resp,
messageHeaders.getResponseStatusCode(),
- responseHeaders));
+ responseHeaders),
+ ctx.executor());
Review Comment:
<del>@snuyanzin the issue was this stuck thread:</del>
```
Jan 14 21:15:14 at
org.apache.flink.core.testutils.BlockerSync.awaitBlocker(BlockerSync.java:59)
Jan 14 21:15:14 - locked <0x00000000aa141c48> (a java.lang.Object)
Jan 14 21:15:14 at
org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:597)
```
<del>Something has changed internally in Netty, and this test consistently
times out on Azure CI. The issue arises when the response is not written from
the event loop, and Netty enqueues the write operations on the event loop
instead, and then we close the REST server itself (which is the point of the
test), the writes don't always go through properly.</del>
<del>By running the write operations on the event loop, this issue goes
away. I think it's better to do it this way anyways, so that Netty does not
have to create 3 write tasks to eventually execute on the event loop
thread.</del>
Nevermind, unfortunately this only solved the issue on my own computer, but
the CI still failed at the exact same steps. I'll continue digging...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]