damccorm commented on code in PR #35621: URL: https://github.com/apache/beam/pull/35621#discussion_r2260715275
########## sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java: ########## @@ -276,8 +276,8 @@ public static void main( IdGenerator idGenerator = IdGenerators.decrementingLongs(); ShortIdMap metricsShortIds = new ShortIdMap(); - ExecutorService executorService = Review Comment: Thanks for calling this out - was in the PR description originally, and then I lost it when I updated to include information for folks coming from CHANGES. From the original description: This fixes a thread safety issue which was causing tests to time out and fail. The race condition comes from spinning up Java environments - when we do this today (before this PR), we: 1. Get a singleton ExecutorService object - https://github.com/apache/beam/blob/d0a0e8eba2d48ca3051fa82e2d572cc9863ce5da/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L280 1. Execute a pipeline 1. Shutdown the ExecutorService - https://github.com/apache/beam/blob/d0a0e8eba2d48ca3051fa82e2d572cc9863ce5da/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnHarness.java#L430 When this is all handled in different processes (as the current FnApiRunner implementation does), its not a problem since the ExecutorService doesn't need to outlive the pipeline. In multi-pipeline processes, however, the same ExecutorService object gets reused and teardown can get called while it is executing or before it starts. This leads to a long execution and an eventual timeout (see comments in PR below for details on error thrown). This PR fixes the issue by creating a new ExecutorService object per worker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org