[ 
https://issues.apache.org/jira/browse/NIFI-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-16011:
------------------------------
    Description: 
We are consistently seeing system test failures. Looking at the logs from 
Github Actions, it appears that LoadBalanceIT is always the first one to fail, 
with the issue then cascading. It seems that the end of the 
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
each of the 100 expected FlowFiles, and this then gets replicated across the 
cluster.

This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.

That test can be tightened up by producing 20 FlowFiles instead of 100. This 
will reduce the number of requests by 5x, giving us much more breathing room.

 

After digging in, the reduction from 100 FlowFiles to 20 did not provide the 
resilience I was looking for. The issue appears to stem from changes made in 
the latest version of Jetty. It appears that they explicitly 

  was:
We are consistently seeing system test failures. Looking at the logs from 
Github Actions, it appears that LoadBalanceIT is always the first one to fail, 
with the issue then cascading. It seems that the end of the 
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
each of the 100 expected FlowFiles, and this then gets replicated across the 
cluster.

This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.

That test can be tightened up by producing 20 FlowFiles instead of 100. This 
will reduce the number of requests by 5x, giving us much more breathing room.


> Repeated system test failures caused by LoadBalanceIT
> -----------------------------------------------------
>
>                 Key: NIFI-16011
>                 URL: https://issues.apache.org/jira/browse/NIFI-16011
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> We are consistently seeing system test failures. Looking at the logs from 
> Github Actions, it appears that LoadBalanceIT is always the first one to 
> fail, with the issue then cascading. It seems that the end of the 
> LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
> each of the 100 expected FlowFiles, and this then gets replicated across the 
> cluster.
> This, in turn, causes connection pool exhaustion, resulting in
> {code:java}
> IOException: RST_STREAM received {code}
> Which comes back as an HTTP 500 error.
> That test can be tightened up by producing 20 FlowFiles instead of 100. This 
> will reduce the number of requests by 5x, giving us much more breathing room.
>  
> After digging in, the reduction from 100 FlowFiles to 20 did not provide the 
> resilience I was looking for. The issue appears to stem from changes made in 
> the latest version of Jetty. It appears that they explicitly 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to