[ 
https://issues.apache.org/jira/browse/NIFI-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715970#comment-16715970
 ] 

Koji Kawamura commented on NIFI-5882:
-------------------------------------

[~jzahner] Thanks for confirming the doc is correct.

Now, I'm a bit confused with the above comment. To me, the two processors queue 
and one processor queue examples are the same. Those FlowFiles are sorted in 
each queue only, not cluster wide.
 * qL: p1, p4
 * qR1: p2, p5
 * qR2: p3, p6

This doesn't guarantee that the p4 is processed after p3. The node processes 
FlowFiles from local queue and send FlowFiles to other node concurrently. And 
once a FlowFile is sent to another node, the processing order will be 
out-of-control from the sending node.

If you would like to achieve exact synced order to process those data, then you 
can't distribute load or distribution doesn't add any benefit.

If you need to process p1, p2, p3, p4, p5, p6 one by one in order, then I 
suggest not using load-balancing.

> Connector Prioritizers doesn't work together with Load Balance Strategy
> -----------------------------------------------------------------------
>
>                 Key: NIFI-5882
>                 URL: https://issues.apache.org/jira/browse/NIFI-5882
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.8.0
>         Environment: Centos 7.5, Secured 8 Node NiFi Cluster
>            Reporter: Josef Zahner
>            Priority: Major
>         Attachments: connector_config.png, queue_with_one_processor.png, 
> queue_with_two_processors.png, template_overview.png
>
>
> For my template please check the picture "template_overview.png". On the left 
> hand side the working (two processor) example and on the right hand side the 
> not working one (one processor).
> I have a ListSFTP Processor which reads files from 4 different folders. In 
> the filename of the files is a number (epochtime) which I'm parsing and set 
> it as "priority" attribute. We have a cluster, so I what I want to achieve 
> for the FetchSFTP is, that the files are fetched in order and are equally 
> distributed over our 8-node cluster.
> However, it seems that if I'm combining to set the "priority" attribute on an 
> UpdateAttribute processor and on the directly attached connector use the 
> following features:
>  * Load Balance Strategy: Round Robin
>  * Select Prioritizers: PriorityAttributePrioritizer
> the prioritizers doesn't seem to have any impact. 
> If i'm setting the priority attribute on an extra processor and use there 
> only the prioritizer - all files are in order but still on the primary node. 
> On the next processor then I'm setting the loadbalancing strategy for the 
> cluster (and add another attribute, but doesn't matter) and the prioritizer 
> together. That way it works. A picture of the queue for both examples is 
> attached (queue_with_one_processor & queue_with_two_processors.png).
> *To sum up*, it seems if I'm setting the "priority" attribute on an 
> UpdateAttribute processor and directly try to use it on the attached 
> connector with a loadbalancing strategy and the prioritizer 
> (PriorityAttributePrioritizer) then the priority attribute doesn't work as 
> expected. If I'm setting the "priority" attribute on a separate processor and 
> then do on an additional processor the magic load balancing strategy stuff 
> together with the prioritizer then it works. 
> Cheers



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to