Regarding throwing an exception... I believe if you are extending
AbstractProcessor and an exception is thrown out of onTrigger() then the
session is rolled back and any flow files that were accessed are penalized,
which results in leaving them in the incoming connection to the processor
and not being retried until the penalty duration passes. This seems similar
to what Michael described, although it is not stopping the processor from
processing other incoming  flow files.

Ricky's retry idea sounds interesting... I think a lot of people handle
this today by creating a retry loop using UpdateAttribute and
RouteOnAttribute [1].

[1]
https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2


On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com> wrote:

> Is there currently a way to know how many times a FlowFile has been
> penalized? Do we have use cases where we want to penalize a FlowFile *n
> *number
> of times before sending it down an alternate relationship? I could imagine
> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> Relationship relationship). For example, someone might want to process a
> FlowFile three times before giving up on it.
>
> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> mdecou...@googlemail.com> wrote:
>
> > Matt thanks for your reply
> >
> > I guess what I am saying in that case - if there is an error in a
> > FlowFile, then the processor that detects this cannot proceed so instead
> of
> > calling an action to penalize the FlowFile it raises an exception
> > OutOFServiceException or ProcessorException.
> > You could have an exception cause PeanilisedFlowFileException for this
> > case.
> >
> > But within the processor other error causes may arise for an
> > OutOFServiceException
> >
> > The point is that if the processor threw this exception then there can be
> > a duration configuration - a time limit to keep this processor out of
> > service and the connection to it and possibly any processors leading upto
> > it - Naturally this will need to be indicated on the DFM - this will free
> > resources and make the flow well behaved.
> >
> > Environmental failures will simply be a different category/cause of error
> > that can be wrapped/captured also with a more general one
> >
> > With Kind Regards
> > Michael de Courci
> > mdecou...@gmail.com
> >
> >
> >
> >
> > > On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com> wrote:
> > >
> > > Just to recap/level set...
> > >
> > > The distinct between yielding and penalization is important.
> Penalization
> > > is an action taken on a FlowFile because the FlowFile cannot be
> processed
> > > right now (like a naming conflict for instance). The Processor is
> > > indicating that it cannot process that specific FlowFile at the moment
> > but
> > > may be able to process the next. Yielding is an indication that the
> > > Processor is unable to work at all at the moment likely due to an
> > > environmental issue (like the out of service comment).
> > >
> > > If the concept of penalization were moved to a connection, does it
> > > automatically penalize all FlowFile transferred to it? We would lose
> some
> > > granularity if a Processor wanted to penalize some FlowFile routed to a
> > > given Relationship but not others. I'm not sure if this is done in
> > practice
> > > or not, just wanted to mention it.
> > >
> > > Outside of this minor concern, I like the idea. I especially like that
> it
> > > would help with the consistency of Processor behavior and transparency
> > > about what the data flow is actually doing.
> > >
> > > Matt
> > >
> > >
> > > On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> > > mdecou...@googlemail.com> wrote:
> > >
> > >> Hi
> > >> I think it would be better/simpler to have one “out of service”
> concept
> > >> to replace penalizing and yielding and when a plugin throws an
> exception
> > >> then the plugin is deemed out of service, for a duration and so the
> > >> connection to that plugin is disabled for the out of service duration.
> > >>
> > >> When a plugin is out of service and the connection disabled - then
> > >> resources that it uses will be freed(yielded).
> > >>
> > >> The question then is what the behaviour of the plugin before the
> > disabled
> > >> connection - should be.  My thought is to tend towards stability and
> > make
> > >> sure resources are freed, so there may need to be a “domino
> > effect”/cascade
> > >> affect where all plugins before are gradually put out of service.
> > >>
> > >>
> > >> With Kind Regards
> > >> Michael de Courci
> > >> mdecou...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com> wrote:
> > >>>
> > >>> All,
> > >>>
> > >>> I've been thinking about how we handle the concept of penalizing
> > >> FlowFiles. We've had a lot of questions
> > >>> lately about how penalization works & the concept in general. Seems
> the
> > >> following problems exist:
> > >>>
> > >>> - Confusion about difference between penalization & yielding
> > >>> - DFM sees option to configure penalization period on all processors,
> > >> even if they don't penalize FlowFiles.
> > >>> - DFM cannot set penalty duration in 1 case and set a different value
> > >> for a different case (different relationship, for example).
> > >>> - Developers often forget to call penalize()
> > >>> - Developer has to determine whether or not to penalize when
> building a
> > >> processor. It is based on what the developer will
> > >>> think may make sense, but in reality DFM's sometimes want to penalize
> > >> things when the processor doesn't behave that way.
> > >>>
> > >>> I'm wondering if it doesn't make sense to remove the concept of
> > >> penalization all together from Processors and instead
> > >>> move the Penalty Duration so that it's a setting on the Connection. I
> > >> think this would clear up the confusion and give the DFM
> > >>> more control over when/how long to penalize. Could set to the default
> > to
> > >> 30 seconds for self-looping connections and no penalization
> > >>> for other connections.
> > >>>
> > >>> Any thoughts?
> > >>>
> > >>> Thanks
> > >>> -Mark
> > >>
> > >>
> >
> >
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
>

Reply via email to