Hi Josef,

The prioritizers provide a weak ordering to the data, not an absolute sorting. 
What I mean by that is that
if you are prioritizing a FlowFile with attribute A = 123 over a FlowFIle with 
attribute A = 125, then the first
one will likely go first but it's not guaranteed. For example, when you have 
Load Balanced connections,
that Connection between your Funnel and FetchSFTP actually consists of 8 
different queues: one for each
node in your cluster. Within each of those queues, the FlowFiles in the queue 
are prioritized according to
your configured Prioritizers. So you're not guaranteed to process everything 
sequentially according to the
Prioritizer. Data that is swapped out can also change the 'absolute ordering' 
of FlowFiles.

Now, that being said, you should get a 'rough ordering' close to what you would 
expect. The way that you
have this shown here, though, I think is that only the Connection between the 
funnel and FetchSFTP is
using Prioritizers. This means that it will sort the data that it has according 
to your Prioritizer - but the Funnel
is feeding in the data from its Connections and those are not Prioritized. So 
you'll want to ensure that
the Connections between UpdateAttribute and the Funnel are also configured with 
Prioritizers.

Sorry for the wordiness. Hopefully this makes sense. If not, please let us know.

Thanks
-Mark



On Nov 8, 2018, at 2:55 AM, 
<josef.zahn...@swisscom.com<mailto:josef.zahn...@swisscom.com>> 
<josef.zahn...@swisscom.com<mailto:josef.zahn...@swisscom.com>> wrote:

Hi guys

We have a 8 cluster nifi cluster and do a listSFTP on the primary node. After 
the ListSFTP we add some attributes and send it over a funnel to the FetchSFTP. 
On the connection between the funnel and the FetchSFTP we have an “Object 
Threshold” of 100,some “Prioritizer” and round robin loadbalancing to get the 
files in a sorted order. Right after start we had about 800 files (expected 
value due to 8 nodes) in the queue between the funnel and the FetchSFTP, but 
after a few hours (we get about 200k-250k files from each ListSFTP processors) 
the number of files decreased to the number below. However, it seems that all 
nodes gets load, because after the FetchSFTP we see a more or less even 
distributed load.
Next Issue or maybe misunderstanding is, that we would like to have all the 
listSFTP files in a sorded order from the four folders. So we added the 
priority attribute where we assign as value epoch in seconds extracted from 
filename. However, it seems that there is no human understandable logic how the 
files get sorted in the queue between the funnel and the FetchSFTP, because 
after a few hours I see files with nearly the oldest and the newest possible 
timestamp in our DB (which shouldn’t be possible as we have the priority 
attribute with epoch time. Is the a failure in our logic how nifi works here? 
Should we remove the funnel and connect the UpdateAttribute processor directly 
to the FetchSFTP? Or how can we overcome the order issue?

Thanks in advance,
Josef


<image001.png>


<image002.png>

Reply via email to