+1. This seems like something we should provide options for (as we do), but doesn't really need to be made / kept accessible for extension.
Brandon On Fri, May 6, 2016 at 11:45 AM Mark Payne <marka...@hotmail.com> wrote: > I'm definitely a +1. In my experience, the way that most people think > about prioritizing data is > to either assign an absolute priority to a FlowFile and use the > PriorityAttributePrioritizer or to > use the FirstInFirstOut Prioritizer. Any number of processors can be used > to extract the the > 'priority' attribute and prioritize the data that way. I think this makes > the extensibility less valuable, > since the flow itself can be used to determine a 'priority' attribute > based on FlowFile content, attributes, > etc. > > > On May 6, 2016, at 11:16 AM, Joe Witt <joew...@apache.org> wrote: > > > > Team, > > > > I'd like to propose we remove the FlowFilePrioritizer [1] from the set > > of first class extension points we support. > > > > The background: > > > > FlowFilePrioritizer implementations are used to compare flow files as > > they are enqueued on a given connection in the flow. This in turn > > means when flow files are pulled from the queue they are pulled in a > > manner that allows the most important data first to be operated on. > > This is a valuable feature and is heavily utilized. Out of the box > > NiFi provides several obvious prioritizer implementations such as > > first in and out based on age of the flow file, first in based on > > entry order, and honoring a numeric representation of priority set as > > a specific attribute [2]. They are rarely changed and have so far not > > grown in numbers nor have there been any discussions of doing so. If > > I think back to their usage over the past decade I actually think > > there have been only a few ever made. > > > > The concept and ability to sort queues is important and powerful and > > needs to be kept. But making them a first-class extension point I am > > now questioning the value of. The reason being is that as defined the > > interface is intuitive for the developer but much harder for the > > framework side. That combined with their lack of ever being extended > > opens the debate. > > > > When the prioritizers were first envisioned we didn't support the > > concept of swapping out flowfiles to disk when the queues were huge. > > We now do. But we cannot sort (at this time) the swapped out items. > > By getting rid of this extension point as it is now we can instead > > support these types of prioritizers in a different and more optimized > > manner albeit in a less extension friendly way (more coupled to the > > framework). Rather than simply using comparators we can do absolute > > priority assignment and when swapping out flow files we can track the > > largest/smallest priority and thus enable prioritized swap-in. This > > would also be helpful for doing things like auto-cluster load > > balancing or cluster-wide prioritized site-to-site. > > > > So, in short, the interface would go from being a comparator to > > instead providing a method which returns an absolute priority. For > > example, it would have a method called 'getPriority' which takes in a > > flow file and returns a long. > > > > This approach would also still allow chaining prioritizers as we do > today. > > > > We still can support this as something which can be extended for those > > who wish to do so just in a less friendly and more framework coupled > > manner. Basically, this would just be more like we support content > > repository or provenance repository extension where the developer > > needs to both understand the implementation they want but also the > > mechanics of getting that into the build and the deeper implications. > > > > Would like to hear if others are supportive of this or if they see any > > major problems posed by this. Given we're working towards the 1.x > > release this is a good time to pull this cord. If we do this we can > > document the steps and thinking needed to build/contribute new > > prioritizer schemes. > > > > Thanks > > Joe > > > > [1] > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x > > [2] > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x > >