damccorm commented on code in PR #35621:
URL: https://github.com/apache/beam/pull/35621#discussion_r2260715275
##########
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java:
##########
@@ -276,8 +276,8 @@ public static void main(
IdGenerator idGenerator = IdGenerators.decrementingLongs();
ShortIdMap metricsShortIds = new ShortIdMap();
- ExecutorService executorService =
Review Comment:
Thanks for calling this out - was in the PR description originally, and then
I lost it when I updated to include information for folks coming from CHANGES.
From the original description:
This fixes a thread safety issue which was causing tests to time out and
fail.
The race condition comes from spinning up Java environments - when we do
this today (before this PR), we:
1. Get a singleton ExecutorService object -
https://github.com/apache/beam/blob/d0a0e8eba2d48ca3051fa82e2d572cc9863ce5da/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L280
1. Execute a pipeline
1. Shutdown the ExecutorService -
https://github.com/apache/beam/blob/d0a0e8eba2d48ca3051fa82e2d572cc9863ce5da/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L430
When this is all handled in different processes (as the current FnApiRunner
implementation does), its not a problem since the ExecutorService doesn't need
to outlive the pipeline. In multi-pipeline processes, however, the same
ExecutorService object gets reused and teardown can get called while it is
executing or before it starts. This leads to a long execution and an eventual
timeout (see comments in PR below for details on error thrown).
This PR fixes the issue by creating a new ExecutorService object per worker.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]