markap14 commented on PR #11325:
URL: https://github.com/apache/nifi/pull/11325#issuecomment-4682319218

   ## Hypothesis confirmed: Jetty 12.1.10 introduced the HTTP/2 RST_STREAM 
regression
   
   Workflow run 
[27353150066](https://github.com/apache/nifi/actions/runs/27353150066) with 
commit `451d60b` (Jetty pinned back to 12.1.9):
   
   | Job                       | Result   | Notes                               
                                                                 |
   
|---------------------------|----------|------------------------------------------------------------------------------------------------------|
   | `ubuntu-24.04 Java 21`    | PASS     | All system tests green.             
                                                                 |
   | `macos-15   Java 21`      | PASS     | All system tests green.             
                                                                 |
   | `ubuntu-24.04 Java 25`    | FAIL     | One test failed, but not the 
RST_STREAM pattern — see below.                                         |
   | `macos-15   Java 25`      | FAIL     | Two tests failed, neither is 
RST_STREAM — see below.                                                 |
   
   ### Primary finding
   
   `rg -l "RST_STREAM" /tmp/nifi-pr11325-jetty129-logs/` returns **zero hits** 
across both Java 25 troubleshooting archives, and the previously affected tests 
(`LoadBalanceIT`, `ClusteredStatelessFlowIT`, `ClusteredRegistryClientIT`, 
`OffloadContentClaimTruncationIT`, `FlowSynchronizationIT`) all passed on every 
OS/JDK combination. The `RST_STREAM Stream cancelled` / 
`incompleteRequestBodyReset` failure mode that was reproducing on every recent 
main run is gone on Jetty 12.1.9.
   
   ### Remaining Java 25 failures (not Jetty)
   
   These look like pre-existing flakes in different code paths:
   
   - 
**`ClusteredConnectorDrainIT.testDrainWithNodeCompletingAtDifferentTimes`** 
(ubuntu-24.04 Java 25): the test's `@BeforeEach` `waitForAllNodesConnected` 
timed out after 60s with node-2 still `DISCONNECTED`. Cluster join, not 
replication.
   - **`OffloadIT.testOffload`** (macos-15 Java 25): `TimeoutException: 
testOffload() timed out after 10 minutes`. Test hang.
   - **`ClusteredReplayProvenanceIT[2].testReplayLastEvent`** (macos-15 Java 
25): `AssertionFailedError: expected: <2> but was: <1>`. Looks like a real 
test/assertion issue.
   
   None of these involve HTTP/2 RST_STREAM or cluster replication failures.
   
   ### Proposed next step
   
   Suggest we proceed in this order:
   
   1. **Pin `<jetty.version>` to 12.1.9** in `main` (separate PR) as the 
immediate fix for the RST_STREAM regression, with a TODO/comment referencing 
the upstream Jetty bug.
   2. **File an upstream Jetty bug** with a minimal repro (HTTP/2 client+server 
in same JVM over loopback, mTLS, many short POSTs ⇒ RST_STREAM(CANCEL) during 
request body upload). I can put that together.
   3. **Treat the three remaining Java 25 flakes as separate Jiras** and triage 
them independently — they were almost certainly there before, just masked by 
the all-red RST_STREAM noise.
   4. Once 12.1.9 is pinned in `main`, this PR's `LoadBalanceIT` batch-size 
reduction stands on its own as a defense-in-depth load reduction and can be 
reviewed/merged on its merits.
   
   Waiting on direction before proceeding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to