[ 
https://issues.apache.org/jira/browse/NIFI-16011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-16011:
------------------------------
    Description: 
We are consistently seeing system test failures. Looking at the logs from 
Github Actions, it appears that LoadBalanceIT is always the first one to fail, 
with the issue then cascading. It seems that the end of the 
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
each of the 100 expected FlowFiles, and this then gets replicated across the 
cluster.

This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.

That test can be tightened up by producing 20 FlowFiles instead of 100. This 
will reduce the number of requests by 5x, giving us much more breathing room.

  was:
We are consistently seeing system test failures. Looking at the logs from 
Github Actions, it appears that LoadBalanceIT is always the first one to fail, 
with the issue then cascading. It seems that the end of the 
LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
each of the 100 expected FlowFiles, and this then gets replicated across the 
cluster.

This, in turn, causes connection pool exhaustion, resulting in
{code:java}
IOException: RST_STREAM received {code}
Which comes back as an HTTP 500 error.

That test can be tightened up to perform a listing once and just look at 
FlowFile Summaries instead of fetching the full FlowFile each time.


> Repeated system test failures caused by LoadBalanceIT
> -----------------------------------------------------
>
>                 Key: NIFI-16011
>                 URL: https://issues.apache.org/jira/browse/NIFI-16011
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>
> We are consistently seeing system test failures. Looking at the logs from 
> Github Actions, it appears that LoadBalanceIT is always the first one to 
> fail, with the issue then cascading. It seems that the end of the 
> LoadBalanceIT.testPartitionByAttribute test is performing a queue listing for 
> each of the 100 expected FlowFiles, and this then gets replicated across the 
> cluster.
> This, in turn, causes connection pool exhaustion, resulting in
> {code:java}
> IOException: RST_STREAM received {code}
> Which comes back as an HTTP 500 error.
> That test can be tightened up by producing 20 FlowFiles instead of 100. This 
> will reduce the number of requests by 5x, giving us much more breathing room.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to