I do think there is (a little) merit to a separate sort by file size, but only because adding UpdateAttribute feels like a work around. Having it as a native prioritizer should give people the impression that we thought about it and consider it a valid use case.
-- Mike On Mon, May 9, 2016 at 1:46 PM, Aldrin Piri <aldrinp...@gmail.com> wrote: > +1 for this. It also occurred to me that these are currently not shown in > the generated docs and only in the user guide. > > Mike, > > As far as sorting by size, do you think there is merit beyond the > PriorityAttributePrioritizer for this case? An update attribute "copying" > fileSize to "priority" should accomplish the same effect (optionally > negated depending on desired sort order). > > On Mon, May 9, 2016 at 1:37 PM, Michael Moser <moser...@gmail.com> wrote: > > > +1 as long as the existing 4 prioritizers remain as options. I have seen > > people use all of them. I have also seen someone hack together what was > > effectively a SmallestFileFirstPrioritizer and a > > LargestFileFirstPrioritizer by using RouteOnAttribute on different > > ${fileSize} values. The use case was "I receive a batch of files and I > > don't want the 1 excessively large file to delay the multitude of other > > small files from moving on first". Perhaps we can support that use case > > too. > > > > -- Mike > > > > > > On Fri, May 6, 2016 at 1:11 PM, Andy LoPresto <alopre...@apache.org> > > wrote: > > > > > +1. I think the benefits of this move far outweigh the potential but > > > unrealized value of extensible prioritizers. > > > > > > Andy LoPresto > > > alopre...@apache.org > > > *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>* > > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > > > > On May 6, 2016, at 9:49 AM, Brandon DeVries <b...@jhu.edu> wrote: > > > > > > +1. This seems like something we should provide options for (as we > do), > > > but doesn't really need to be made / kept accessible for extension. > > > > > > Brandon > > > > > > On Fri, May 6, 2016 at 11:45 AM Mark Payne <marka...@hotmail.com> > wrote: > > > > > > I'm definitely a +1. In my experience, the way that most people think > > > about prioritizing data is > > > to either assign an absolute priority to a FlowFile and use the > > > PriorityAttributePrioritizer or to > > > use the FirstInFirstOut Prioritizer. Any number of processors can be > used > > > to extract the the > > > 'priority' attribute and prioritize the data that way. I think this > makes > > > the extensibility less valuable, > > > since the flow itself can be used to determine a 'priority' attribute > > > based on FlowFile content, attributes, > > > etc. > > > > > > On May 6, 2016, at 11:16 AM, Joe Witt <joew...@apache.org> wrote: > > > > > > Team, > > > > > > I'd like to propose we remove the FlowFilePrioritizer [1] from the set > > > of first class extension points we support. > > > > > > The background: > > > > > > FlowFilePrioritizer implementations are used to compare flow files as > > > they are enqueued on a given connection in the flow. This in turn > > > means when flow files are pulled from the queue they are pulled in a > > > manner that allows the most important data first to be operated on. > > > This is a valuable feature and is heavily utilized. Out of the box > > > NiFi provides several obvious prioritizer implementations such as > > > first in and out based on age of the flow file, first in based on > > > entry order, and honoring a numeric representation of priority set as > > > a specific attribute [2]. They are rarely changed and have so far not > > > grown in numbers nor have there been any discussions of doing so. If > > > I think back to their usage over the past decade I actually think > > > there have been only a few ever made. > > > > > > The concept and ability to sort queues is important and powerful and > > > needs to be kept. But making them a first-class extension point I am > > > now questioning the value of. The reason being is that as defined the > > > interface is intuitive for the developer but much harder for the > > > framework side. That combined with their lack of ever being extended > > > opens the debate. > > > > > > When the prioritizers were first envisioned we didn't support the > > > concept of swapping out flowfiles to disk when the queues were huge. > > > We now do. But we cannot sort (at this time) the swapped out items. > > > By getting rid of this extension point as it is now we can instead > > > support these types of prioritizers in a different and more optimized > > > manner albeit in a less extension friendly way (more coupled to the > > > framework). Rather than simply using comparators we can do absolute > > > priority assignment and when swapping out flow files we can track the > > > largest/smallest priority and thus enable prioritized swap-in. This > > > would also be helpful for doing things like auto-cluster load > > > balancing or cluster-wide prioritized site-to-site. > > > > > > So, in short, the interface would go from being a comparator to > > > instead providing a method which returns an absolute priority. For > > > example, it would have a method called 'getPriority' which takes in a > > > flow file and returns a long. > > > > > > This approach would also still allow chaining prioritizers as we do > > > > > > today. > > > > > > > > > We still can support this as something which can be extended for those > > > who wish to do so just in a less friendly and more framework coupled > > > manner. Basically, this would just be more like we support content > > > repository or provenance repository extension where the developer > > > needs to both understand the implementation they want but also the > > > mechanics of getting that into the build and the deeper implications. > > > > > > Would like to hear if others are supportive of this or if they see any > > > major problems posed by this. Given we're working towards the 1.x > > > release this is a good time to pull this cord. If we do this we can > > > document the steps and thinking needed to build/contribute new > > > prioritizer schemes. > > > > > > Thanks > > > Joe > > > > > > [1] > > > > > > > > > > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x > > > > > > [2] > > > > > > > > > > > > https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x > > > > > > > > > > > > > > >