+1 for this.  It also occurred to me that these are currently not shown in
the generated docs and only in the user guide.

Mike,

As far as sorting by size, do you think there is merit beyond the
PriorityAttributePrioritizer for this case?  An update attribute "copying"
fileSize to "priority" should accomplish the same effect (optionally
negated depending on desired sort order).

On Mon, May 9, 2016 at 1:37 PM, Michael Moser <moser...@gmail.com> wrote:

> +1 as long as the existing 4 prioritizers remain as options.  I have seen
> people use all of them.  I have also seen someone hack together what was
> effectively a SmallestFileFirstPrioritizer and a
> LargestFileFirstPrioritizer by using RouteOnAttribute on different
> ${fileSize} values.  The use case was "I receive a batch of files and I
> don't want the 1 excessively large file to delay the multitude of other
> small files from moving on first".  Perhaps we can support that use case
> too.
>
> -- Mike
>
>
> On Fri, May 6, 2016 at 1:11 PM, Andy LoPresto <alopre...@apache.org>
> wrote:
>
> > +1. I think the benefits of this move far outweigh the potential but
> > unrealized value of extensible prioritizers.
> >
> > Andy LoPresto
> > alopre...@apache.org
> > *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > On May 6, 2016, at 9:49 AM, Brandon DeVries <b...@jhu.edu> wrote:
> >
> > +1.  This seems like something we should provide options for (as we do),
> > but doesn't really need to be made / kept accessible for extension.
> >
> > Brandon
> >
> > On Fri, May 6, 2016 at 11:45 AM Mark Payne <marka...@hotmail.com> wrote:
> >
> > I'm definitely a +1. In my experience, the way that most people think
> > about prioritizing data is
> > to either assign an absolute priority to a FlowFile and use the
> > PriorityAttributePrioritizer or to
> > use the FirstInFirstOut Prioritizer. Any number of processors can be used
> > to extract the the
> > 'priority' attribute and prioritize the data that way. I think this makes
> > the extensibility less valuable,
> > since the flow itself can be used to determine a 'priority' attribute
> > based on FlowFile content, attributes,
> > etc.
> >
> > On May 6, 2016, at 11:16 AM, Joe Witt <joew...@apache.org> wrote:
> >
> > Team,
> >
> > I'd like to propose we remove the FlowFilePrioritizer [1] from the set
> > of first class extension points we support.
> >
> > The background:
> >
> > FlowFilePrioritizer implementations are used to compare flow files as
> > they are enqueued on a given connection in the flow.  This in turn
> > means when flow files are pulled from the queue they are pulled in a
> > manner that allows the most important data first to be operated on.
> > This is a valuable feature and is heavily utilized.  Out of the box
> > NiFi provides several obvious prioritizer implementations such as
> > first in and out based on age of the flow file, first in based on
> > entry order, and honoring a numeric representation of priority set as
> > a specific attribute [2].  They are rarely changed and have so far not
> > grown in numbers nor have there been any discussions of doing so.  If
> > I think back to their usage over the past decade I actually think
> > there have been only a few ever made.
> >
> > The concept and ability to sort queues is important and powerful and
> > needs to be kept.  But making them a first-class extension point I am
> > now questioning the value of.  The reason being is that as defined the
> > interface is intuitive for the developer but much harder for the
> > framework side.  That combined with their lack of ever being extended
> > opens the debate.
> >
> > When the prioritizers were first envisioned we didn't support the
> > concept of swapping out flowfiles to disk when the queues were huge.
> > We now do.  But we cannot sort (at this time) the swapped out items.
> > By getting rid of this extension point as it is now we can instead
> > support these types of prioritizers in a different and more optimized
> > manner albeit in a less extension friendly way (more coupled to the
> > framework).  Rather than simply using comparators we can do absolute
> > priority assignment and when swapping out flow files we can track the
> > largest/smallest priority and thus enable prioritized swap-in.  This
> > would also be helpful for doing things like auto-cluster load
> > balancing or cluster-wide prioritized site-to-site.
> >
> > So, in short, the interface would go from being a comparator to
> > instead providing a method which returns an absolute priority.  For
> > example, it would have a method called 'getPriority' which takes in a
> > flow file and returns a long.
> >
> > This approach would also still allow chaining prioritizers as we do
> >
> > today.
> >
> >
> > We still can support this as something which can be extended for those
> > who wish to do so just in a less friendly and more framework coupled
> > manner.  Basically, this would just be more like we support content
> > repository or provenance repository extension where the developer
> > needs to both understand the implementation they want but also the
> > mechanics of getting that into the build and the deeper implications.
> >
> > Would like to hear if others are supportive of this or if they see any
> > major problems posed by this.  Given we're working towards the 1.x
> > release this is a good time to pull this cord.  If we do this we can
> > document the steps and thinking needed to build/contribute new
> > prioritizer schemes.
> >
> > Thanks
> > Joe
> >
> > [1]
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=blob;f=nifi-api/src/main/java/org/apache/nifi/flowfile/FlowFilePrioritizer.java;h=684f454f57094a0e1f669333d63be06cd5a8a043;hb=refs/heads/0.x
> >
> > [2]
> >
> >
> >
> https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=tree;f=nifi-nar-bundles/nifi-standard-bundle/nifi-standard-prioritizers/src/main/java/org/apache/nifi/prioritizer;h=6d5db994f9fd9624bf7f548ebd69548b6917ccd1;hb=refs/heads/0.x
> >
> >
> >
> >
>

Reply via email to