pvillard31 opened a new pull request, #11002:
URL: https://github.com/apache/nifi/pull/11002

   # Summary
   
   NIFI-15698 - Fix Python bridge hang during startup with many Python 
processors
   
   It has been challenging to work on this one and I was unable to come up with 
a system test systematically reproducing the issue. It was, however, very easy 
to reproduce the problem following the steps in the repository shared by the 
reporter: https://github.com/distroitt/nifi-bug
   
   I was able to confirm the issue on latest release and was able to confirm 
that the fix is solving the problem by building the 2.9.0-SNAPSHOT Docker image 
and running the same tests.
   
   When loading a flow with many Python processors, NiFi can hang during 
startup or restart and never reach "Started Application". The root cause is 
virtual thread pinning in `NiFiPythonGateway`. The four methods that guard the 
`activeInvocations` list (`beginInvocation`, `endInvocation`, `putNewObject`, 
`putObject`) use `synchronized`, which pins virtual threads to their carrier 
threads in JDK 21. During flow synchronization, the main thread and many 
processor-initialization virtual threads all contend for this single intrinsic 
lock. Because each waiting virtual thread pins its carrier, the ForkJoinPool 
carrier threads are quickly exhausted, and no thread can make progress - 
including the one holding the lock. This change replaces the `synchronized` 
methods with a `ReentrantLock`, which is virtual-thread-friendly: blocked 
virtual threads yield their carrier thread instead of pinning it.
   
   The `PythonProcess` lifecycle has been updated so that a process is only 
handed out to callers after `discoverExtensions()` completes. New `isReady()`, 
`waitUntilReady()`, and `markReadyAndNotify()` methods prevent the main thread 
or initialization threads from calling into a Python process that is still 
loading extensions, which was another source of hangs on first start.
   
   The `getProcessForNextComponent` method in `StandardPythonBridge` has been 
restructured to hold the bridge lock only for the decision phase (picking or 
creating a process), then release it before performing blocking operations like 
`start()` and `discoverExtensions()`. Previously the entire method was 
`synchronized`, blocking all other processor creation threads during these slow 
operations.
   
   The `createProcessorBridge` method now receives the already-resolved 
`PythonProcessorDetails` from its caller instead of calling 
`getProcessorTypes()` again. This eliminates two redundant Python proxy 
round-trips per processor creation, reducing gateway lock contention during 
startup.
   
   A workaround has been added in `ProcessorInspection.py` for a CPython 3.11+ 
bug (gh-95185) where `ast.parse()` can raise `SystemError: AST constructor 
recursion depth mismatch` under concurrent load. The error is caught and the 
file is treated as a non-processor module so that extension loading continues.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-00000`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-00000`
   - [ ] Pull request contains [commits 
signed](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits)
 with a registered key indicating `Verified` status
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `./mvnw clean install -P contrib-check`
     - [ ] JDK 21
     - [ ] JDK 25
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to