pvillard31 opened a new pull request, #11002: URL: https://github.com/apache/nifi/pull/11002
# Summary NIFI-15698 - Fix Python bridge hang during startup with many Python processors It has been challenging to work on this one and I was unable to come up with a system test systematically reproducing the issue. It was, however, very easy to reproduce the problem following the steps in the repository shared by the reporter: https://github.com/distroitt/nifi-bug I was able to confirm the issue on latest release and was able to confirm that the fix is solving the problem by building the 2.9.0-SNAPSHOT Docker image and running the same tests. When loading a flow with many Python processors, NiFi can hang during startup or restart and never reach "Started Application". The root cause is virtual thread pinning in `NiFiPythonGateway`. The four methods that guard the `activeInvocations` list (`beginInvocation`, `endInvocation`, `putNewObject`, `putObject`) use `synchronized`, which pins virtual threads to their carrier threads in JDK 21. During flow synchronization, the main thread and many processor-initialization virtual threads all contend for this single intrinsic lock. Because each waiting virtual thread pins its carrier, the ForkJoinPool carrier threads are quickly exhausted, and no thread can make progress - including the one holding the lock. This change replaces the `synchronized` methods with a `ReentrantLock`, which is virtual-thread-friendly: blocked virtual threads yield their carrier thread instead of pinning it. The `PythonProcess` lifecycle has been updated so that a process is only handed out to callers after `discoverExtensions()` completes. New `isReady()`, `waitUntilReady()`, and `markReadyAndNotify()` methods prevent the main thread or initialization threads from calling into a Python process that is still loading extensions, which was another source of hangs on first start. The `getProcessForNextComponent` method in `StandardPythonBridge` has been restructured to hold the bridge lock only for the decision phase (picking or creating a process), then release it before performing blocking operations like `start()` and `discoverExtensions()`. Previously the entire method was `synchronized`, blocking all other processor creation threads during these slow operations. The `createProcessorBridge` method now receives the already-resolved `PythonProcessorDetails` from its caller instead of calling `getProcessorTypes()` again. This eliminates two redundant Python proxy round-trips per processor creation, reducing gateway lock contention during startup. A workaround has been added in `ProcessorInspection.py` for a CPython 3.11+ bug (gh-95185) where `ast.parse()` can raise `SystemError: AST constructor recursion depth mismatch` under concurrent load. The error is caught and the file is treated as a non-processor module so that extension loading continues. # Tracking Please complete the following tracking steps prior to pull request creation. ### Issue Tracking - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created ### Pull Request Tracking - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-00000` - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-00000` - [ ] Pull request contains [commits signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits) with a registered key indicating `Verified` status ### Pull Request Formatting - [ ] Pull Request based on current revision of the `main` branch - [ ] Pull Request refers to a feature branch with one commit containing changes # Verification Please indicate the verification steps performed prior to pull request creation. ### Build - [ ] Build completed using `./mvnw clean install -P contrib-check` - [ ] JDK 21 - [ ] JDK 25 ### Licensing - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html) - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files ### Documentation - [ ] Documentation formatting appears as expected in rendered files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
