Pierre Villard created NIFI-14872:
-------------------------------------
Summary: Enable redistribution of flow files in round robin
load-balanced connections when cluster scales
Key: NIFI-14872
URL: https://issues.apache.org/jira/browse/NIFI-14872
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Pierre Villard
Assignee: Pierre Villard
h3. Summary
Currently, when using round-robin load balancing in NiFi connections, FlowFiles
are distributed across nodes when FlowFiles are "entering" the connection. If
new nodes are added to the cluster after FlowFiles have been queued, these
existing FlowFiles are not redistributed to leverage the additional processing
capacity. This change proposes setting the rebalance flag to {{true}} in
RoundRobinPartitioner to enable automatic redistribution when the cluster
topology changes.
h3. Current Behavior
* FlowFiles in load-balanced connections remain on their originally assigned
nodes even after cluster scale-up
* New nodes only receive newly generated FlowFiles, not the existing backlog
* Large queues of data may continue processing on original nodes while new
nodes remain underutilized
h3. Proposed Change
Set the rebalance flag to {{true}} in {{RoundRobinPartitioner}} to trigger
redistribution of queued FlowFiles when cluster membership changes.
h3. Benefits
* {*}Improved scalability{*}: When users scale up the cluster to handle large
backlogs, existing queued data will be redistributed to utilize the new
processing capacity
* {*}Better resource utilization{*}: Prevents scenarios where original nodes
are overloaded while new nodes sit idle
* {*}More intuitive behavior{*}: Aligns with user expectations that scaling up
will help process existing workloads faster
h3. Potential Drawbacks
* {*}Unnecessary data movement{*}: In cases where downstream processors are
very fast (e.g., UpdateAttribute), redistribution could move large amounts of
data across the network unnecessarily
* {*}Network overhead{*}: Redistribution of large volumes of data could
temporarily impact cluster network performance
h3. Justification
While there are scenarios where redistribution could cause unnecessary data
movement, the impact of NOT redistributing is potentially worse. The primary
use case for cluster scale-up is to handle increased processing demands, making
automatic redistribution the more intuitive and beneficial default behavior.
Users who scale their clusters expect the new capacity to help with existing
workloads, not just future data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)