Joe,

I agree it is a lot of work, which is why I was thinking of starting with a 
processor that could do some of these operations before looking further. If the 
processor could move flowfile's between nodes in the cluster it would be a good 
step. Data comes in form a queue on any node, but gets written out to a queue 
on only the desired node; or gets round robin outputted for a distribute 
scenario.

I want to work on it, and was trying to figure out if it could be done using 
only a processor, or if larger changes would be needed for sure.

--Peter 

-----Original Message-----
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Thursday, June 7, 2018 3:34 PM
To: dev@nifi.apache.org
Subject: Re: [EXT] Re: Primary Only Content Migration

Peter,

It isn't a pattern that is well supported now in a cluster context.

What is needed are automatically load balanced connections with partitioning.  
This would mean a user could select a given relationship and indicate that data 
should automatically distributed and they should be able to express, 
optionally, if there is a correlation attribute that is used for ensuring data 
which belongs together stays together or becomes together.  We could use this 
to automatically have a connection result in data being distributed across the 
cluster for load balancing purposes and also ensure that data is brought back 
to a single node whenever necessary which is the case in certain scenarios like 
fork/distribute/process/join/send and things like distributed receipt then join 
for merging (like defragmenting data which has been split).  To join them 
together we need affinity/correlation and this could work based on some sort of 
hashing mechanism where there are as many buckets as their are nodes in a 
cluster at a given time.  It needs a lot of thought/design/testing/etc..

I was just having a conversation about this yesterday.  It is definitely a 
thing and will be a major effort.  Will make a JIRA for this soon.

Thanks

On Thu, Jun 7, 2018 at 5:21 PM, Peter Wicks (pwicks) <pwi...@micron.com> wrote:
> Bryan,
>
> We see this with large files that we have split up into smaller files and 
> distributed across the cluster using site-to-site. We then want to merge them 
> back together, so we send them to the primary node before continuing 
> processing.
>
> --Peter
>
> -----Original Message-----
> From: Bryan Bende [mailto:bbe...@gmail.com]
> Sent: Thursday, June 7, 2018 12:47 PM
> To: dev@nifi.apache.org
> Subject: [EXT] Re: Primary Only Content Migration
>
> Peter,
>
> There really shouldn't be any non-source processors scheduled for primary 
> node only. We may even want to consider preventing that option when the 
> processor has an incoming connection to avoid creating any confusion.
>
> As long as you set source processors to primary node only then everything 
> should be ok... if primary node changes, the source processor starts 
> executing on the new primary node, and any flow files it already produced on 
> the old primary node will continue to be worked off by the downstream 
> processors on the old node until they are all processed.
>
> -Bryan
>
>
>
> On Thu, Jun 7, 2018 at 1:55 PM, Peter Wicks (pwicks) <pwi...@micron.com> 
> wrote:
>> I'm sure many of you have the same situation, a flow that runs on a cluster, 
>> and at some point merges back down to a primary only processor; your files 
>> sit there in the queue with nowhere to go... We've used the work around of 
>> having a remote processor group that loops the data back to the primary node 
>> for a while, but would really like a clean/simple solution. This approach 
>> requires that users be able to put an input port on the root flow, and then 
>> route the file back down, which is a nuisance.
>>
>> I have been thinking of adding either a processor that moves data between 
>> specific nodes in a cluster, or a queue (?) option that will let users 
>> migrate the content of a flowfile back to the master node. This would allow 
>> you to move data back to a primary very easily without needing RPG's and 
>> input ports at the root level.
>>
>> All of my development work with NiFi has been focused on processors, so I'm 
>> not really sure where I would start with this.  Thoughts?
>>
>> Thanks,
>>   Peter

Reply via email to