I think the penalization being on the connection makes sense, but I'm not
sure about taking penalization away from the processor altogether.

If a processor can't get far enough to transfer a flowfile to a
relationship, it can rollback to return the flowfile to the queue and
optionally penalize the flowfile so it won't be immediately reprocessed.
If I understand correctly, if the failure is transient the processor should
rollback without a penalty, but if the problem is likely to re-occur if
flowfile is immediately reprocessed then penalization can delay the
flowfile for a period.  I think the transient vs. re-occurring decision
makes sense in the processor, but the severity of the penalty if problems
are likely to re-occur makes sense on connection for greater user control.

On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com> wrote:

> Is there currently a way to know how many times a FlowFile has been
> penalized? Do we have use cases where we want to penalize a FlowFile *n
> *number
> of times before sending it down an alternate relationship? I could imagine
> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> Relationship relationship). For example, someone might want to process a
> FlowFile three times before giving up on it.
>
> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> mdecou...@googlemail.com> wrote:
>
> > Matt thanks for your reply
> >
> > I guess what I am saying in that case - if there is an error in a
> > FlowFile, then the processor that detects this cannot proceed so instead
> of
> > calling an action to penalize the FlowFile it raises an exception
> > OutOFServiceException or ProcessorException.
> > You could have an exception cause PeanilisedFlowFileException for this
> > case.
> >
> > But within the processor other error causes may arise for an
> > OutOFServiceException
> >
> > The point is that if the processor threw this exception then there can be
> > a duration configuration - a time limit to keep this processor out of
> > service and the connection to it and possibly any processors leading upto
> > it - Naturally this will need to be indicated on the DFM - this will free
> > resources and make the flow well behaved.
> >
> > Environmental failures will simply be a different category/cause of error
> > that can be wrapped/captured also with a more general one
> >
> > With Kind Regards
> > Michael de Courci
> > mdecou...@gmail.com
> >
> >
> >
> >
> > > On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com> wrote:
> > >
> > > Just to recap/level set...
> > >
> > > The distinct between yielding and penalization is important.
> Penalization
> > > is an action taken on a FlowFile because the FlowFile cannot be
> processed
> > > right now (like a naming conflict for instance). The Processor is
> > > indicating that it cannot process that specific FlowFile at the moment
> > but
> > > may be able to process the next. Yielding is an indication that the
> > > Processor is unable to work at all at the moment likely due to an
> > > environmental issue (like the out of service comment).
> > >
> > > If the concept of penalization were moved to a connection, does it
> > > automatically penalize all FlowFile transferred to it? We would lose
> some
> > > granularity if a Processor wanted to penalize some FlowFile routed to a
> > > given Relationship but not others. I'm not sure if this is done in
> > practice
> > > or not, just wanted to mention it.
> > >
> > > Outside of this minor concern, I like the idea. I especially like that
> it
> > > would help with the consistency of Processor behavior and transparency
> > > about what the data flow is actually doing.
> > >
> > > Matt
> > >
> > >
> > > On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> > > mdecou...@googlemail.com> wrote:
> > >
> > >> Hi
> > >> I think it would be better/simpler to have one “out of service”
> concept
> > >> to replace penalizing and yielding and when a plugin throws an
> exception
> > >> then the plugin is deemed out of service, for a duration and so the
> > >> connection to that plugin is disabled for the out of service duration.
> > >>
> > >> When a plugin is out of service and the connection disabled - then
> > >> resources that it uses will be freed(yielded).
> > >>
> > >> The question then is what the behaviour of the plugin before the
> > disabled
> > >> connection - should be.  My thought is to tend towards stability and
> > make
> > >> sure resources are freed, so there may need to be a “domino
> > effect”/cascade
> > >> affect where all plugins before are gradually put out of service.
> > >>
> > >>
> > >> With Kind Regards
> > >> Michael de Courci
> > >> mdecou...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com> wrote:
> > >>>
> > >>> All,
> > >>>
> > >>> I've been thinking about how we handle the concept of penalizing
> > >> FlowFiles. We've had a lot of questions
> > >>> lately about how penalization works & the concept in general. Seems
> the
> > >> following problems exist:
> > >>>
> > >>> - Confusion about difference between penalization & yielding
> > >>> - DFM sees option to configure penalization period on all processors,
> > >> even if they don't penalize FlowFiles.
> > >>> - DFM cannot set penalty duration in 1 case and set a different value
> > >> for a different case (different relationship, for example).
> > >>> - Developers often forget to call penalize()
> > >>> - Developer has to determine whether or not to penalize when
> building a
> > >> processor. It is based on what the developer will
> > >>> think may make sense, but in reality DFM's sometimes want to penalize
> > >> things when the processor doesn't behave that way.
> > >>>
> > >>> I'm wondering if it doesn't make sense to remove the concept of
> > >> penalization all together from Processors and instead
> > >>> move the Penalty Duration so that it's a setting on the Connection. I
> > >> think this would clear up the confusion and give the DFM
> > >>> more control over when/how long to penalize. Could set to the default
> > to
> > >> 30 seconds for self-looping connections and no penalization
> > >>> for other connections.
> > >>>
> > >>> Any thoughts?
> > >>>
> > >>> Thanks
> > >>> -Mark
> > >>
> > >>
> >
> >
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
>

Reply via email to