[ 
https://issues.apache.org/jira/browse/NIFI-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17745199#comment-17745199
 ] 

ASF subversion and git services commented on NIFI-11837:
--------------------------------------------------------

Commit 8f5392dd114ab5daeed12c6792d8dd2eb75bc362 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=8f5392dd11 ]

NIFI-11837: When determining whether or not a queue should exit 'swap mode' we 
need to look at the updated size of the queue after migrating data from swapped 
queue to active queue. Previously, we were looking at the size variable that 
was obtained from before the migration happened.

This closes #7506

Signed-off-by: David Handermann <exceptionfact...@apache.org>


> When a queue starts swapping out data, it never stops
> -----------------------------------------------------
>
>                 Key: NIFI-11837
>                 URL: https://issues.apache.org/jira/browse/NIFI-11837
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>             Fix For: 2.0.0, 1.23.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a queue reaches the swap threshold (defined in nifi.properties as 
> {{nifi.queue.swap.threshold}} and defaulted to 20,000 FlowFiles), it enters 
> 'swap mode'. However, it never exits swap mode.
> This means that even if the queue is completely emptied, the data that does 
> enter the queue will be swapped out if the queue reaches 10K FlowFiles. 
> Additionally, there is significant overhead under the covers in handling this.
> To replicate, create a simple flow:
>   GenerateFlowFile -> UpdateAttribute.
> Set GenerateFlowFile to run with 6 threads, Run Schedule of "0 secs" and a 
> Run Duration of "100 ms". Auto-terminate the 'success' relationship of 
> UpdateAttribute
> This will quickly fill the queue beyond 20K FlowFiles.
> Now, stop GenerateFlowFile. Lower to 4 threads and a Run Duration of "10 ms"
> Start both processors. Watch the logs indicating that data is constantly be 
> swapped in and out.
> This can have a very significant impact on performance. In my testing on my 
> laptop, once this flow started swapping, its 5-minute stats dropped from 14.5 
> MM FlowFiles per 5 minutes down to 11 MM FlowFiles (roughly a 30% decline)
> In addition to lower throughput, it causes much higher resource utilization, 
> which affects all flows.
> This defect may affect anyone using a large number of small FlowFiles, 
> especially those where data may be bursty enough to exceed to 20,000 FlowFile 
> swapping limit or flows that have Backpressure Threshold set beyond 10,000.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to