I do think there is (a little) merit to a separate sort by file size, but
only because adding UpdateAttribute feels like a work around.  Having it as
a native prioritizer should give people the impression that we thought
about it and consider it a valid use case.

-- Mike


On Mon, May 9, 2016 at 1:46 PM, Aldrin Piri <aldrinp...@gmail.com> wrote:

> +1 for this.  It also occurred to me that these are currently not shown in
> the generated docs and only in the user guide.
>
> Mike,
>
> As far as sorting by size, do you think there is merit beyond the
> PriorityAttributePrioritizer for this case?  An update attribute "copying"
> fileSize to "priority" should accomplish the same effect (optionally
> negated depending on desired sort order).
>
> On Mon, May 9, 2016 at 1:37 PM, Michael Moser <moser...@gmail.com> wrote:
>
> > +1 as long as the existing 4 prioritizers remain as options.  I have seen
> > people use all of them.  I have also seen someone hack together what was
> > effectively a SmallestFileFirstPrioritizer and a
> > LargestFileFirstPrioritizer by using RouteOnAttribute on different
> > ${fileSize} values.  The use case was "I receive a batch of files and I
> > don't want the 1 excessively large file to delay the multitude of other
> > small files from moving on first".  Perhaps we can support that use case
> > too.
> >
> > -- Mike
> >
> >
> > On Fri, May 6, 2016 at 1:11 PM, Andy LoPresto <alopre...@apache.org>
> > wrote:
> >
> > > +1. I think the benefits of this move far outweigh the potential but
> > > unrealized value of extensible prioritizers.
> > >
> > > Andy LoPresto
> > > alopre...@apache.org
> > > *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
> > > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >
> > > On May 6, 2016, at 9:49 AM, Brandon DeVries <b...@jhu.edu> wrote:
> > >
> > > +1.  This seems like something we should provide options for (as we
> do),
> > > but doesn't really need to be made / kept accessible for extension.
> > >
> > > Brandon
> > >
> > > On Fri, May 6, 2016 at 11:45 AM Mark Payne <marka...@hotmail.com>
> wrote:
> > >
> > > I'm definitely a +1. In my experience, the way that most people think
> > > about prioritizing data is
> > > to either assign an absolute priority to a FlowFile and use the
> > > PriorityAttributePrioritizer or to
> > > use the FirstInFirstOut Prioritizer. Any number of processors can be
> used
> > > to extract the the
> > > 'priority' attribute and prioritize the data that way. I think this
> makes
> > > the extensibility less valuable,
> > > since the flow itself can be used to determine a 'priority' attribute
> > > based on FlowFile content, attributes,
> > > etc.
> > >
> > > On May 6, 2016, at 11:16 AM, Joe Witt <joew...@apache.org> wrote:
> > >
> > > Team,
> > >
> > > I'd like to propose we remove the FlowFilePrioritizer [1] from the set
> > > of first class extension points we support.
> > >
> > > The background:
> > >
> > > FlowFilePrioritizer implementations are used to compare flow files as
> > > they are enqueued on a given connection in the flow.  This in turn
> > > means when flow files are pulled from the queue they are pulled in a
> > > manner that allows the most important data first to be operated on.
> > > This is a valuable feature and is heavily utilized.  Out of the box
> > > NiFi provides several obvious prioritizer implementations such as
> > > first in and out based on age of the flow file, first in based on
> > > entry order, and honoring a numeric representation of priority set as
> > > a specific attribute [2].  They are rarely changed and have so far not
> > > grown in numbers nor have there been any discussions of doing so.  If
> > > I think back to their usage over the past decade I actually think
> > > there have been only a few ever made.
> > >
> > > The concept and ability to sort queues is important and powerful and
> > > needs to be kept.  But making them a first-class extension point I am
> > > now questioning the value of.  The reason being is that as defined the
> > > interface is intuitive for the developer but much harder for the
> > > framework side.  That combined with their lack of ever being extended
> > > opens the debate.
> > >
> > > When the prioritizers were first envisioned we didn't support the
> > > concept of swapping out flowfiles to disk when the queues were huge.
> > > We now do.  But we cannot sort (at this time) the swapped out items.
> > > By getting rid of this extension point as it is now we can instead
> > > support these types of prioritizers in a different and more optimized
> > > manner albeit in a less extension friendly way (more coupled to the
> > > framework).  Rather than simply using comparators we can do absolute
> > > priority assignment and when swapping out flow files we can track the
> > > largest/smallest priority and thus enable prioritized swap-in.  This
> > > would also be helpful for doing things like auto-cluster load
> > > balancing or cluster-wide prioritized site-to-site.
> > >
> > > So, in short, the interface would go from being a comparator to
> > > instead providing a method which returns an absolute priority.  For
> > > example, it would have a method called 'getPriority' which takes in a
> > > flow file and returns a long.
> > >
> > > This approach would also still allow chaining prioritizers as we do
> > >
> > > today.
> > >
> > >
> > > We still can support this as something which can be extended for those
> > > who wish to do so just in a less friendly and more framework coupled
> > > manner.  Basically, this would just be more like we support content
> > > repository or provenance repository extension where the developer
> > > needs to both understand the implementation they want but also the
> > > mechanics of getting that into the build and the deeper implications.
> > >
> > > Would like to hear if others are supportive of this or if they see any
> > > major problems posed by this.  Given we're working towards the 1.x
> > > release this is a good time to pull this cord.  If we do this we can
> > > document the steps and thinking needed to build/contribute new
> > > prioritizer schemes.
> > >
> > > Thanks
> > > Joe
> > >
> > > [1]
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x
> > >
> > > [2]
> > >
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x
> > >
> > >
> > >
> > >
> >
>

Reply via email to