Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Bryan Bende
I really like the idea of being able to have different penalty durations
for each connection, and I think this would make things a bit clearer then
relying on the developer to determine when to penalize things.

On Thu, Jan 28, 2016 at 11:34 AM, Mark Payne  wrote:

> All,
>
> I've been thinking about how we handle the concept of penalizing
> FlowFiles. We've had a lot of questions
> lately about how penalization works & the concept in general. Seems the
> following problems exist:
>
> - Confusion about difference between penalization & yielding
> - DFM sees option to configure penalization period on all processors, even
> if they don't penalize FlowFiles.
> - DFM cannot set penalty duration in 1 case and set a different value for
> a different case (different relationship, for example).
> - Developers often forget to call penalize()
> - Developer has to determine whether or not to penalize when building a
> processor. It is based on what the developer will
> think may make sense, but in reality DFM's sometimes want to penalize
> things when the processor doesn't behave that way.
>
> I'm wondering if it doesn't make sense to remove the concept of
> penalization all together from Processors and instead
> move the Penalty Duration so that it's a setting on the Connection. I
> think this would clear up the confusion and give the DFM
> more control over when/how long to penalize. Could set to the default to
> 30 seconds for self-looping connections and no penalization
> for other connections.
>
> Any thoughts?
>
> Thanks
> -Mark


Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread dan bress
Mark,
   I agree with all the points you mention about penalization being
confusing, and I think the ability to apply a penalty to Flowfile's outside
of a processor is a clearer way to express what is happening.

   I worry that having the penalty be a property of the connection would
also be confusing.  To me, penalizing a FlowFile is an action you do to a
FlowFile.  In my head, connections don't do actions on FlowFile, they just
sort them and move them along.  I might find it confusing that the
connection is "doing things" to the flow files, unless there was some kind
of visual cue as to what was going on.  Kind of like how people have
brought up that the "expire" concept is a little confusing, because of the
lack of visual cue.

So when I started typing this email I was thinking we should have a new
concept of a "penalizer" that's kind of like a processor but just puts a
penalty on a flow file.  After typing it, that might be a new construct
that isn't really needed, and I'm OK with this being put on a connection,
but I would like there to be a visual cue on the connection indicating that
it is penalizing flow files.

On Thu, Jan 28, 2016 at 8:34 AM Mark Payne  wrote:

> All,
>
> I've been thinking about how we handle the concept of penalizing
> FlowFiles. We've had a lot of questions
> lately about how penalization works & the concept in general. Seems the
> following problems exist:
>
> - Confusion about difference between penalization & yielding
> - DFM sees option to configure penalization period on all processors, even
> if they don't penalize FlowFiles.
> - DFM cannot set penalty duration in 1 case and set a different value for
> a different case (different relationship, for example).
> - Developers often forget to call penalize()
> - Developer has to determine whether or not to penalize when building a
> processor. It is based on what the developer will
> think may make sense, but in reality DFM's sometimes want to penalize
> things when the processor doesn't behave that way.
>
> I'm wondering if it doesn't make sense to remove the concept of
> penalization all together from Processors and instead
> move the Penalty Duration so that it's a setting on the Connection. I
> think this would clear up the confusion and give the DFM
> more control over when/how long to penalize. Could set to the default to
> 30 seconds for self-looping connections and no penalization
> for other connections.
>
> Any thoughts?
>
> Thanks
> -Mark


Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Michael de Courci
Hi
I think it would be better/simpler to have one “out of service”  concept to 
replace penalizing and yielding and when a plugin throws an exception then the 
plugin is deemed out of service, for a duration and so the connection to that 
plugin is disabled for the out of service duration.

When a plugin is out of service and the connection disabled - then resources 
that it uses will be freed(yielded).

The question then is what the behaviour of the plugin before the disabled 
connection - should be.  My thought is to tend towards stability and make sure 
resources are freed, so there may need to be a “domino effect”/cascade affect 
where all plugins before are gradually put out of service.


With Kind Regards
Michael de Courci
mdecou...@gmail.com




> On 28 Jan 2016, at 16:34, Mark Payne  wrote:
> 
> All,
> 
> I've been thinking about how we handle the concept of penalizing FlowFiles. 
> We've had a lot of questions
> lately about how penalization works & the concept in general. Seems the 
> following problems exist:
> 
> - Confusion about difference between penalization & yielding
> - DFM sees option to configure penalization period on all processors, even if 
> they don't penalize FlowFiles.
> - DFM cannot set penalty duration in 1 case and set a different value for a 
> different case (different relationship, for example).
> - Developers often forget to call penalize()
> - Developer has to determine whether or not to penalize when building a 
> processor. It is based on what the developer will
> think may make sense, but in reality DFM's sometimes want to penalize things 
> when the processor doesn't behave that way.
> 
> I'm wondering if it doesn't make sense to remove the concept of penalization 
> all together from Processors and instead
> move the Penalty Duration so that it's a setting on the Connection. I think 
> this would clear up the confusion and give the DFM
> more control over when/how long to penalize. Could set to the default to 30 
> seconds for self-looping connections and no penalization
> for other connections.
> 
> Any thoughts?
> 
> Thanks
> -Mark



Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Mark Payne
Dan,

Certainly a valid concern. I like the idea of an indicator on the connection 
that
it is penalizing. I know there has been some thought already going into some
UI redesigns, so that's a good thing to keep in mind there.

I can also understand the concern about a Connection performing an action
on the FlowFile, but this concerns me less. I say this because the job of the
Connection is to sort/order/prioritize FlowFiles and provide the appropriate
FlowFiles to the 'destination component'. Penalization can be thought of as
simply determining whether or not it is appropriate to provide a given FlowFile.
I.e., it doesn't really change the FlowFile itself so much as it makes a 
decision
about when/how to distribute that FlowFile.


> On Jan 28, 2016, at 12:00 PM, dan bress  wrote:
> 
> Mark,
>   I agree with all the points you mention about penalization being
> confusing, and I think the ability to apply a penalty to Flowfile's outside
> of a processor is a clearer way to express what is happening.
> 
>   I worry that having the penalty be a property of the connection would
> also be confusing.  To me, penalizing a FlowFile is an action you do to a
> FlowFile.  In my head, connections don't do actions on FlowFile, they just
> sort them and move them along.  I might find it confusing that the
> connection is "doing things" to the flow files, unless there was some kind
> of visual cue as to what was going on.  Kind of like how people have
> brought up that the "expire" concept is a little confusing, because of the
> lack of visual cue.
> 
> So when I started typing this email I was thinking we should have a new
> concept of a "penalizer" that's kind of like a processor but just puts a
> penalty on a flow file.  After typing it, that might be a new construct
> that isn't really needed, and I'm OK with this being put on a connection,
> but I would like there to be a visual cue on the connection indicating that
> it is penalizing flow files.
> 
> On Thu, Jan 28, 2016 at 8:34 AM Mark Payne  wrote:
> 
>> All,
>> 
>> I've been thinking about how we handle the concept of penalizing
>> FlowFiles. We've had a lot of questions
>> lately about how penalization works & the concept in general. Seems the
>> following problems exist:
>> 
>> - Confusion about difference between penalization & yielding
>> - DFM sees option to configure penalization period on all processors, even
>> if they don't penalize FlowFiles.
>> - DFM cannot set penalty duration in 1 case and set a different value for
>> a different case (different relationship, for example).
>> - Developers often forget to call penalize()
>> - Developer has to determine whether or not to penalize when building a
>> processor. It is based on what the developer will
>> think may make sense, but in reality DFM's sometimes want to penalize
>> things when the processor doesn't behave that way.
>> 
>> I'm wondering if it doesn't make sense to remove the concept of
>> penalization all together from Processors and instead
>> move the Penalty Duration so that it's a setting on the Connection. I
>> think this would clear up the confusion and give the DFM
>> more control over when/how long to penalize. Could set to the default to
>> 30 seconds for self-looping connections and no penalization
>> for other connections.
>> 
>> Any thoughts?
>> 
>> Thanks
>> -Mark



Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Matt Gilman
Just to recap/level set...

The distinct between yielding and penalization is important. Penalization
is an action taken on a FlowFile because the FlowFile cannot be processed
right now (like a naming conflict for instance). The Processor is
indicating that it cannot process that specific FlowFile at the moment but
may be able to process the next. Yielding is an indication that the
Processor is unable to work at all at the moment likely due to an
environmental issue (like the out of service comment).

If the concept of penalization were moved to a connection, does it
automatically penalize all FlowFile transferred to it? We would lose some
granularity if a Processor wanted to penalize some FlowFile routed to a
given Relationship but not others. I'm not sure if this is done in practice
or not, just wanted to mention it.

Outside of this minor concern, I like the idea. I especially like that it
would help with the consistency of Processor behavior and transparency
about what the data flow is actually doing.

Matt


On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
mdecou...@googlemail.com> wrote:

> Hi
> I think it would be better/simpler to have one “out of service”  concept
> to replace penalizing and yielding and when a plugin throws an exception
> then the plugin is deemed out of service, for a duration and so the
> connection to that plugin is disabled for the out of service duration.
>
> When a plugin is out of service and the connection disabled - then
> resources that it uses will be freed(yielded).
>
> The question then is what the behaviour of the plugin before the disabled
> connection - should be.  My thought is to tend towards stability and make
> sure resources are freed, so there may need to be a “domino effect”/cascade
> affect where all plugins before are gradually put out of service.
>
>
> With Kind Regards
> Michael de Courci
> mdecou...@gmail.com
>
>
>
>
> > On 28 Jan 2016, at 16:34, Mark Payne  wrote:
> >
> > All,
> >
> > I've been thinking about how we handle the concept of penalizing
> FlowFiles. We've had a lot of questions
> > lately about how penalization works & the concept in general. Seems the
> following problems exist:
> >
> > - Confusion about difference between penalization & yielding
> > - DFM sees option to configure penalization period on all processors,
> even if they don't penalize FlowFiles.
> > - DFM cannot set penalty duration in 1 case and set a different value
> for a different case (different relationship, for example).
> > - Developers often forget to call penalize()
> > - Developer has to determine whether or not to penalize when building a
> processor. It is based on what the developer will
> > think may make sense, but in reality DFM's sometimes want to penalize
> things when the processor doesn't behave that way.
> >
> > I'm wondering if it doesn't make sense to remove the concept of
> penalization all together from Processors and instead
> > move the Penalty Duration so that it's a setting on the Connection. I
> think this would clear up the confusion and give the DFM
> > more control over when/how long to penalize. Could set to the default to
> 30 seconds for self-looping connections and no penalization
> > for other connections.
> >
> > Any thoughts?
> >
> > Thanks
> > -Mark
>
>


Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Michael de Courci
Sorry any thoughts;

“
I think it would be better/simpler to have one “out of service”  concept to 
replace penalizing and yielding and when a plugin throws an exception then the 
plugin is deemed out of service, for a duration and so the connection to that 
plugin is disabled for the out of service duration.

When a plugin is out of service and the connection disabled - then resources 
that it uses will be freed(yielded).

The question then is what the behaviour of the plugin before the disabled 
connection - should be.  My thought is to tend towards stability and make sure 
resources are freed, so there may need to be a “domino effect”/cascade affect 
where all plugins before are gradually put out of service.”

Excuse my lack og understanding but does penalizing a processor need to be an 
action - will it not always be derived from  an error condition?

With Kind Regards
Michael de Courci
mdecou...@gmail.com




> On 28 Jan 2016, at 17:09, Mark Payne  wrote:
> 
> Dan,
> 
> Certainly a valid concern. I like the idea of an indicator on the connection 
> that
> it is penalizing. I know there has been some thought already going into some
> UI redesigns, so that's a good thing to keep in mind there.
> 
> I can also understand the concern about a Connection performing an action
> on the FlowFile, but this concerns me less. I say this because the job of the
> Connection is to sort/order/prioritize FlowFiles and provide the appropriate
> FlowFiles to the 'destination component'. Penalization can be thought of as
> simply determining whether or not it is appropriate to provide a given 
> FlowFile.
> I.e., it doesn't really change the FlowFile itself so much as it makes a 
> decision
> about when/how to distribute that FlowFile.
> 
> 
>> On Jan 28, 2016, at 12:00 PM, dan bress  wrote:
>> 
>> Mark,
>>  I agree with all the points you mention about penalization being
>> confusing, and I think the ability to apply a penalty to Flowfile's outside
>> of a processor is a clearer way to express what is happening.
>> 
>>  I worry that having the penalty be a property of the connection would
>> also be confusing.  To me, penalizing a FlowFile is an action you do to a
>> FlowFile.  In my head, connections don't do actions on FlowFile, they just
>> sort them and move them along.  I might find it confusing that the
>> connection is "doing things" to the flow files, unless there was some kind
>> of visual cue as to what was going on.  Kind of like how people have
>> brought up that the "expire" concept is a little confusing, because of the
>> lack of visual cue.
>> 
>> So when I started typing this email I was thinking we should have a new
>> concept of a "penalizer" that's kind of like a processor but just puts a
>> penalty on a flow file.  After typing it, that might be a new construct
>> that isn't really needed, and I'm OK with this being put on a connection,
>> but I would like there to be a visual cue on the connection indicating that
>> it is penalizing flow files.
>> 
>> On Thu, Jan 28, 2016 at 8:34 AM Mark Payne  wrote:
>> 
>>> All,
>>> 
>>> I've been thinking about how we handle the concept of penalizing
>>> FlowFiles. We've had a lot of questions
>>> lately about how penalization works & the concept in general. Seems the
>>> following problems exist:
>>> 
>>> - Confusion about difference between penalization & yielding
>>> - DFM sees option to configure penalization period on all processors, even
>>> if they don't penalize FlowFiles.
>>> - DFM cannot set penalty duration in 1 case and set a different value for
>>> a different case (different relationship, for example).
>>> - Developers often forget to call penalize()
>>> - Developer has to determine whether or not to penalize when building a
>>> processor. It is based on what the developer will
>>> think may make sense, but in reality DFM's sometimes want to penalize
>>> things when the processor doesn't behave that way.
>>> 
>>> I'm wondering if it doesn't make sense to remove the concept of
>>> penalization all together from Processors and instead
>>> move the Penalty Duration so that it's a setting on the Connection. I
>>> think this would clear up the confusion and give the DFM
>>> more control over when/how long to penalize. Could set to the default to
>>> 30 seconds for self-looping connections and no penalization
>>> for other connections.
>>> 
>>> Any thoughts?
>>> 
>>> Thanks
>>> -Mark
> 



Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Michael de Courci
Matt thanks for your reply

I guess what I am saying in that case - if there is an error in a FlowFile, 
then the processor that detects this cannot proceed so instead of calling an 
action to penalize the FlowFile it raises an exception OutOFServiceException or 
ProcessorException.
You could have an exception cause PeanilisedFlowFileException for this case.

But within the processor other error causes may arise for an 
OutOFServiceException

The point is that if the processor threw this exception then there can be a 
duration configuration - a time limit to keep this processor out of service and 
the connection to it and possibly any processors leading upto it - Naturally 
this will need to be indicated on the DFM - this will free resources and make 
the flow well behaved.

Environmental failures will simply be a different category/cause of error that 
can be wrapped/captured also with a more general one

With Kind Regards
Michael de Courci
mdecou...@gmail.com




> On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
> 
> Just to recap/level set...
> 
> The distinct between yielding and penalization is important. Penalization
> is an action taken on a FlowFile because the FlowFile cannot be processed
> right now (like a naming conflict for instance). The Processor is
> indicating that it cannot process that specific FlowFile at the moment but
> may be able to process the next. Yielding is an indication that the
> Processor is unable to work at all at the moment likely due to an
> environmental issue (like the out of service comment).
> 
> If the concept of penalization were moved to a connection, does it
> automatically penalize all FlowFile transferred to it? We would lose some
> granularity if a Processor wanted to penalize some FlowFile routed to a
> given Relationship but not others. I'm not sure if this is done in practice
> or not, just wanted to mention it.
> 
> Outside of this minor concern, I like the idea. I especially like that it
> would help with the consistency of Processor behavior and transparency
> about what the data flow is actually doing.
> 
> Matt
> 
> 
> On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> mdecou...@googlemail.com> wrote:
> 
>> Hi
>> I think it would be better/simpler to have one “out of service”  concept
>> to replace penalizing and yielding and when a plugin throws an exception
>> then the plugin is deemed out of service, for a duration and so the
>> connection to that plugin is disabled for the out of service duration.
>> 
>> When a plugin is out of service and the connection disabled - then
>> resources that it uses will be freed(yielded).
>> 
>> The question then is what the behaviour of the plugin before the disabled
>> connection - should be.  My thought is to tend towards stability and make
>> sure resources are freed, so there may need to be a “domino effect”/cascade
>> affect where all plugins before are gradually put out of service.
>> 
>> 
>> With Kind Regards
>> Michael de Courci
>> mdecou...@gmail.com
>> 
>> 
>> 
>> 
>>> On 28 Jan 2016, at 16:34, Mark Payne  wrote:
>>> 
>>> All,
>>> 
>>> I've been thinking about how we handle the concept of penalizing
>> FlowFiles. We've had a lot of questions
>>> lately about how penalization works & the concept in general. Seems the
>> following problems exist:
>>> 
>>> - Confusion about difference between penalization & yielding
>>> - DFM sees option to configure penalization period on all processors,
>> even if they don't penalize FlowFiles.
>>> - DFM cannot set penalty duration in 1 case and set a different value
>> for a different case (different relationship, for example).
>>> - Developers often forget to call penalize()
>>> - Developer has to determine whether or not to penalize when building a
>> processor. It is based on what the developer will
>>> think may make sense, but in reality DFM's sometimes want to penalize
>> things when the processor doesn't behave that way.
>>> 
>>> I'm wondering if it doesn't make sense to remove the concept of
>> penalization all together from Processors and instead
>>> move the Penalty Duration so that it's a setting on the Connection. I
>> think this would clear up the confusion and give the DFM
>>> more control over when/how long to penalize. Could set to the default to
>> 30 seconds for self-looping connections and no penalization
>>> for other connections.
>>> 
>>> Any thoughts?
>>> 
>>> Thanks
>>> -Mark
>> 
>> 



Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Ricky Saltzer
Is there currently a way to know how many times a FlowFile has been
penalized? Do we have use cases where we want to penalize a FlowFile *n *number
of times before sending it down an alternate relationship? I could imagine
an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
Relationship relationship). For example, someone might want to process a
FlowFile three times before giving up on it.

On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
mdecou...@googlemail.com> wrote:

> Matt thanks for your reply
>
> I guess what I am saying in that case - if there is an error in a
> FlowFile, then the processor that detects this cannot proceed so instead of
> calling an action to penalize the FlowFile it raises an exception
> OutOFServiceException or ProcessorException.
> You could have an exception cause PeanilisedFlowFileException for this
> case.
>
> But within the processor other error causes may arise for an
> OutOFServiceException
>
> The point is that if the processor threw this exception then there can be
> a duration configuration - a time limit to keep this processor out of
> service and the connection to it and possibly any processors leading upto
> it - Naturally this will need to be indicated on the DFM - this will free
> resources and make the flow well behaved.
>
> Environmental failures will simply be a different category/cause of error
> that can be wrapped/captured also with a more general one
>
> With Kind Regards
> Michael de Courci
> mdecou...@gmail.com
>
>
>
>
> > On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
> >
> > Just to recap/level set...
> >
> > The distinct between yielding and penalization is important. Penalization
> > is an action taken on a FlowFile because the FlowFile cannot be processed
> > right now (like a naming conflict for instance). The Processor is
> > indicating that it cannot process that specific FlowFile at the moment
> but
> > may be able to process the next. Yielding is an indication that the
> > Processor is unable to work at all at the moment likely due to an
> > environmental issue (like the out of service comment).
> >
> > If the concept of penalization were moved to a connection, does it
> > automatically penalize all FlowFile transferred to it? We would lose some
> > granularity if a Processor wanted to penalize some FlowFile routed to a
> > given Relationship but not others. I'm not sure if this is done in
> practice
> > or not, just wanted to mention it.
> >
> > Outside of this minor concern, I like the idea. I especially like that it
> > would help with the consistency of Processor behavior and transparency
> > about what the data flow is actually doing.
> >
> > Matt
> >
> >
> > On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> > mdecou...@googlemail.com> wrote:
> >
> >> Hi
> >> I think it would be better/simpler to have one “out of service”  concept
> >> to replace penalizing and yielding and when a plugin throws an exception
> >> then the plugin is deemed out of service, for a duration and so the
> >> connection to that plugin is disabled for the out of service duration.
> >>
> >> When a plugin is out of service and the connection disabled - then
> >> resources that it uses will be freed(yielded).
> >>
> >> The question then is what the behaviour of the plugin before the
> disabled
> >> connection - should be.  My thought is to tend towards stability and
> make
> >> sure resources are freed, so there may need to be a “domino
> effect”/cascade
> >> affect where all plugins before are gradually put out of service.
> >>
> >>
> >> With Kind Regards
> >> Michael de Courci
> >> mdecou...@gmail.com
> >>
> >>
> >>
> >>
> >>> On 28 Jan 2016, at 16:34, Mark Payne  wrote:
> >>>
> >>> All,
> >>>
> >>> I've been thinking about how we handle the concept of penalizing
> >> FlowFiles. We've had a lot of questions
> >>> lately about how penalization works & the concept in general. Seems the
> >> following problems exist:
> >>>
> >>> - Confusion about difference between penalization & yielding
> >>> - DFM sees option to configure penalization period on all processors,
> >> even if they don't penalize FlowFiles.
> >>> - DFM cannot set penalty duration in 1 case and set a different value
> >> for a different case (different relationship, for example).
> >>> - Developers often forget to call penalize()
> >>> - Developer has to determine whether or not to penalize when building a
> >> processor. It is based on what the developer will
> >>> think may make sense, but in reality DFM's sometimes want to penalize
> >> things when the processor doesn't behave that way.
> >>>
> >>> I'm wondering if it doesn't make sense to remove the concept of
> >> penalization all together from Processors and instead
> >>> move the Penalty Duration so that it's a setting on the Connection. I
> >> think this would clear up the confusion and give the DFM
> >>> more control over when/how long to penalize. Could set to the default
> to
> >> 30 seconds for self-looping connection

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Bryan Bende
Regarding throwing an exception... I believe if you are extending
AbstractProcessor and an exception is thrown out of onTrigger() then the
session is rolled back and any flow files that were accessed are penalized,
which results in leaving them in the incoming connection to the processor
and not being retried until the penalty duration passes. This seems similar
to what Michael described, although it is not stopping the processor from
processing other incoming  flow files.

Ricky's retry idea sounds interesting... I think a lot of people handle
this today by creating a retry loop using UpdateAttribute and
RouteOnAttribute [1].

[1]
https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2


On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer  wrote:

> Is there currently a way to know how many times a FlowFile has been
> penalized? Do we have use cases where we want to penalize a FlowFile *n
> *number
> of times before sending it down an alternate relationship? I could imagine
> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> Relationship relationship). For example, someone might want to process a
> FlowFile three times before giving up on it.
>
> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> mdecou...@googlemail.com> wrote:
>
> > Matt thanks for your reply
> >
> > I guess what I am saying in that case - if there is an error in a
> > FlowFile, then the processor that detects this cannot proceed so instead
> of
> > calling an action to penalize the FlowFile it raises an exception
> > OutOFServiceException or ProcessorException.
> > You could have an exception cause PeanilisedFlowFileException for this
> > case.
> >
> > But within the processor other error causes may arise for an
> > OutOFServiceException
> >
> > The point is that if the processor threw this exception then there can be
> > a duration configuration - a time limit to keep this processor out of
> > service and the connection to it and possibly any processors leading upto
> > it - Naturally this will need to be indicated on the DFM - this will free
> > resources and make the flow well behaved.
> >
> > Environmental failures will simply be a different category/cause of error
> > that can be wrapped/captured also with a more general one
> >
> > With Kind Regards
> > Michael de Courci
> > mdecou...@gmail.com
> >
> >
> >
> >
> > > On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
> > >
> > > Just to recap/level set...
> > >
> > > The distinct between yielding and penalization is important.
> Penalization
> > > is an action taken on a FlowFile because the FlowFile cannot be
> processed
> > > right now (like a naming conflict for instance). The Processor is
> > > indicating that it cannot process that specific FlowFile at the moment
> > but
> > > may be able to process the next. Yielding is an indication that the
> > > Processor is unable to work at all at the moment likely due to an
> > > environmental issue (like the out of service comment).
> > >
> > > If the concept of penalization were moved to a connection, does it
> > > automatically penalize all FlowFile transferred to it? We would lose
> some
> > > granularity if a Processor wanted to penalize some FlowFile routed to a
> > > given Relationship but not others. I'm not sure if this is done in
> > practice
> > > or not, just wanted to mention it.
> > >
> > > Outside of this minor concern, I like the idea. I especially like that
> it
> > > would help with the consistency of Processor behavior and transparency
> > > about what the data flow is actually doing.
> > >
> > > Matt
> > >
> > >
> > > On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> > > mdecou...@googlemail.com> wrote:
> > >
> > >> Hi
> > >> I think it would be better/simpler to have one “out of service”
> concept
> > >> to replace penalizing and yielding and when a plugin throws an
> exception
> > >> then the plugin is deemed out of service, for a duration and so the
> > >> connection to that plugin is disabled for the out of service duration.
> > >>
> > >> When a plugin is out of service and the connection disabled - then
> > >> resources that it uses will be freed(yielded).
> > >>
> > >> The question then is what the behaviour of the plugin before the
> > disabled
> > >> connection - should be.  My thought is to tend towards stability and
> > make
> > >> sure resources are freed, so there may need to be a “domino
> > effect”/cascade
> > >> affect where all plugins before are gradually put out of service.
> > >>
> > >>
> > >> With Kind Regards
> > >> Michael de Courci
> > >> mdecou...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>> On 28 Jan 2016, at 16:34, Mark Payne  wrote:
> > >>>
> > >>> All,
> > >>>
> > >>> I've been thinking about how we handle the concept of penalizing
> > >> FlowFiles. We've had a lot of questions
> > >>> lately about how penalization works & the concept in general. Seems
> the
> > >> following problems exis

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Joe Skora
I think the penalization being on the connection makes sense, but I'm not
sure about taking penalization away from the processor altogether.

If a processor can't get far enough to transfer a flowfile to a
relationship, it can rollback to return the flowfile to the queue and
optionally penalize the flowfile so it won't be immediately reprocessed.
If I understand correctly, if the failure is transient the processor should
rollback without a penalty, but if the problem is likely to re-occur if
flowfile is immediately reprocessed then penalization can delay the
flowfile for a period.  I think the transient vs. re-occurring decision
makes sense in the processor, but the severity of the penalty if problems
are likely to re-occur makes sense on connection for greater user control.

On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer  wrote:

> Is there currently a way to know how many times a FlowFile has been
> penalized? Do we have use cases where we want to penalize a FlowFile *n
> *number
> of times before sending it down an alternate relationship? I could imagine
> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> Relationship relationship). For example, someone might want to process a
> FlowFile three times before giving up on it.
>
> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> mdecou...@googlemail.com> wrote:
>
> > Matt thanks for your reply
> >
> > I guess what I am saying in that case - if there is an error in a
> > FlowFile, then the processor that detects this cannot proceed so instead
> of
> > calling an action to penalize the FlowFile it raises an exception
> > OutOFServiceException or ProcessorException.
> > You could have an exception cause PeanilisedFlowFileException for this
> > case.
> >
> > But within the processor other error causes may arise for an
> > OutOFServiceException
> >
> > The point is that if the processor threw this exception then there can be
> > a duration configuration - a time limit to keep this processor out of
> > service and the connection to it and possibly any processors leading upto
> > it - Naturally this will need to be indicated on the DFM - this will free
> > resources and make the flow well behaved.
> >
> > Environmental failures will simply be a different category/cause of error
> > that can be wrapped/captured also with a more general one
> >
> > With Kind Regards
> > Michael de Courci
> > mdecou...@gmail.com
> >
> >
> >
> >
> > > On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
> > >
> > > Just to recap/level set...
> > >
> > > The distinct between yielding and penalization is important.
> Penalization
> > > is an action taken on a FlowFile because the FlowFile cannot be
> processed
> > > right now (like a naming conflict for instance). The Processor is
> > > indicating that it cannot process that specific FlowFile at the moment
> > but
> > > may be able to process the next. Yielding is an indication that the
> > > Processor is unable to work at all at the moment likely due to an
> > > environmental issue (like the out of service comment).
> > >
> > > If the concept of penalization were moved to a connection, does it
> > > automatically penalize all FlowFile transferred to it? We would lose
> some
> > > granularity if a Processor wanted to penalize some FlowFile routed to a
> > > given Relationship but not others. I'm not sure if this is done in
> > practice
> > > or not, just wanted to mention it.
> > >
> > > Outside of this minor concern, I like the idea. I especially like that
> it
> > > would help with the consistency of Processor behavior and transparency
> > > about what the data flow is actually doing.
> > >
> > > Matt
> > >
> > >
> > > On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
> > > mdecou...@googlemail.com> wrote:
> > >
> > >> Hi
> > >> I think it would be better/simpler to have one “out of service”
> concept
> > >> to replace penalizing and yielding and when a plugin throws an
> exception
> > >> then the plugin is deemed out of service, for a duration and so the
> > >> connection to that plugin is disabled for the out of service duration.
> > >>
> > >> When a plugin is out of service and the connection disabled - then
> > >> resources that it uses will be freed(yielded).
> > >>
> > >> The question then is what the behaviour of the plugin before the
> > disabled
> > >> connection - should be.  My thought is to tend towards stability and
> > make
> > >> sure resources are freed, so there may need to be a “domino
> > effect”/cascade
> > >> affect where all plugins before are gradually put out of service.
> > >>
> > >>
> > >> With Kind Regards
> > >> Michael de Courci
> > >> mdecou...@gmail.com
> > >>
> > >>
> > >>
> > >>
> > >>> On 28 Jan 2016, at 16:34, Mark Payne  wrote:
> > >>>
> > >>> All,
> > >>>
> > >>> I've been thinking about how we handle the concept of penalizing
> > >> FlowFiles. We've had a lot of questions
> > >>> lately about how penalization works & the concept in general. Seems
> the
> > >> following pro

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Mark Payne

The retry idea concerns me a bit. If we were to have a method like:

penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship 
relationship)

I think that leaves out some info - even if a FlowFile is
penalized, it must be penalized and sent somewhere. So there would have to be
a relationship to send it to if penalized and another to send it to if not 
penalizing.
This also I think puts more onus on the developer to understand how it would be
used - I believe the user should be making decisions about how many times to
penalize, not the developer.

> On Jan 28, 2016, at 2:03 PM, Bryan Bende  wrote:
> 
> Regarding throwing an exception... I believe if you are extending
> AbstractProcessor and an exception is thrown out of onTrigger() then the
> session is rolled back and any flow files that were accessed are penalized,
> which results in leaving them in the incoming connection to the processor
> and not being retried until the penalty duration passes. This seems similar
> to what Michael described, although it is not stopping the processor from
> processing other incoming  flow files.
> 
> Ricky's retry idea sounds interesting... I think a lot of people handle
> this today by creating a retry loop using UpdateAttribute and
> RouteOnAttribute [1].
> 
> [1]
> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2
> 
> 
> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer  wrote:
> 
>> Is there currently a way to know how many times a FlowFile has been
>> penalized? Do we have use cases where we want to penalize a FlowFile *n
>> *number
>> of times before sending it down an alternate relationship? I could imagine
>> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
>> Relationship relationship). For example, someone might want to process a
>> FlowFile three times before giving up on it.
>> 
>> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
>> mdecou...@googlemail.com> wrote:
>> 
>>> Matt thanks for your reply
>>> 
>>> I guess what I am saying in that case - if there is an error in a
>>> FlowFile, then the processor that detects this cannot proceed so instead
>> of
>>> calling an action to penalize the FlowFile it raises an exception
>>> OutOFServiceException or ProcessorException.
>>> You could have an exception cause PeanilisedFlowFileException for this
>>> case.
>>> 
>>> But within the processor other error causes may arise for an
>>> OutOFServiceException
>>> 
>>> The point is that if the processor threw this exception then there can be
>>> a duration configuration - a time limit to keep this processor out of
>>> service and the connection to it and possibly any processors leading upto
>>> it - Naturally this will need to be indicated on the DFM - this will free
>>> resources and make the flow well behaved.
>>> 
>>> Environmental failures will simply be a different category/cause of error
>>> that can be wrapped/captured also with a more general one
>>> 
>>> With Kind Regards
>>> Michael de Courci
>>> mdecou...@gmail.com
>>> 
>>> 
>>> 
>>> 
 On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
 
 Just to recap/level set...
 
 The distinct between yielding and penalization is important.
>> Penalization
 is an action taken on a FlowFile because the FlowFile cannot be
>> processed
 right now (like a naming conflict for instance). The Processor is
 indicating that it cannot process that specific FlowFile at the moment
>>> but
 may be able to process the next. Yielding is an indication that the
 Processor is unable to work at all at the moment likely due to an
 environmental issue (like the out of service comment).
 
 If the concept of penalization were moved to a connection, does it
 automatically penalize all FlowFile transferred to it? We would lose
>> some
 granularity if a Processor wanted to penalize some FlowFile routed to a
 given Relationship but not others. I'm not sure if this is done in
>>> practice
 or not, just wanted to mention it.
 
 Outside of this minor concern, I like the idea. I especially like that
>> it
 would help with the consistency of Processor behavior and transparency
 about what the data flow is actually doing.
 
 Matt
 
 
 On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
 mdecou...@googlemail.com> wrote:
 
> Hi
> I think it would be better/simpler to have one “out of service”
>> concept
> to replace penalizing and yielding and when a plugin throws an
>> exception
> then the plugin is deemed out of service, for a duration and so the
> connection to that plugin is disabled for the out of service duration.
> 
> When a plugin is out of service and the connection disabled - then
> resources that it uses will be freed(yielded).
> 
> The question then is what the behaviour of the plugin before the
>>> disabled
> connection - s

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Mark Payne
Joe,

You bring up a great point. I realized after sending the initial e-mail that 
Processors
still would need the ability to penalize a FlowFile in case of rollback. But I 
think this should
be the only way that a Processor is able to penalize a FlowFile - to indicate 
that it will not process
the FlowFile for a while. But the processor would no longer indicate 'the next 
processor cannot
process the FlowFile for some time'



> On Jan 28, 2016, at 2:08 PM, Joe Skora  wrote:
> 
> I think the penalization being on the connection makes sense, but I'm not
> sure about taking penalization away from the processor altogether.
> 
> If a processor can't get far enough to transfer a flowfile to a
> relationship, it can rollback to return the flowfile to the queue and
> optionally penalize the flowfile so it won't be immediately reprocessed.
> If I understand correctly, if the failure is transient the processor should
> rollback without a penalty, but if the problem is likely to re-occur if
> flowfile is immediately reprocessed then penalization can delay the
> flowfile for a period.  I think the transient vs. re-occurring decision
> makes sense in the processor, but the severity of the penalty if problems
> are likely to re-occur makes sense on connection for greater user control.
> 
> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer  wrote:
> 
>> Is there currently a way to know how many times a FlowFile has been
>> penalized? Do we have use cases where we want to penalize a FlowFile *n
>> *number
>> of times before sending it down an alternate relationship? I could imagine
>> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
>> Relationship relationship). For example, someone might want to process a
>> FlowFile three times before giving up on it.
>> 
>> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
>> mdecou...@googlemail.com> wrote:
>> 
>>> Matt thanks for your reply
>>> 
>>> I guess what I am saying in that case - if there is an error in a
>>> FlowFile, then the processor that detects this cannot proceed so instead
>> of
>>> calling an action to penalize the FlowFile it raises an exception
>>> OutOFServiceException or ProcessorException.
>>> You could have an exception cause PeanilisedFlowFileException for this
>>> case.
>>> 
>>> But within the processor other error causes may arise for an
>>> OutOFServiceException
>>> 
>>> The point is that if the processor threw this exception then there can be
>>> a duration configuration - a time limit to keep this processor out of
>>> service and the connection to it and possibly any processors leading upto
>>> it - Naturally this will need to be indicated on the DFM - this will free
>>> resources and make the flow well behaved.
>>> 
>>> Environmental failures will simply be a different category/cause of error
>>> that can be wrapped/captured also with a more general one
>>> 
>>> With Kind Regards
>>> Michael de Courci
>>> mdecou...@gmail.com
>>> 
>>> 
>>> 
>>> 
 On 28 Jan 2016, at 17:16, Matt Gilman  wrote:
 
 Just to recap/level set...
 
 The distinct between yielding and penalization is important.
>> Penalization
 is an action taken on a FlowFile because the FlowFile cannot be
>> processed
 right now (like a naming conflict for instance). The Processor is
 indicating that it cannot process that specific FlowFile at the moment
>>> but
 may be able to process the next. Yielding is an indication that the
 Processor is unable to work at all at the moment likely due to an
 environmental issue (like the out of service comment).
 
 If the concept of penalization were moved to a connection, does it
 automatically penalize all FlowFile transferred to it? We would lose
>> some
 granularity if a Processor wanted to penalize some FlowFile routed to a
 given Relationship but not others. I'm not sure if this is done in
>>> practice
 or not, just wanted to mention it.
 
 Outside of this minor concern, I like the idea. I especially like that
>> it
 would help with the consistency of Processor behavior and transparency
 about what the data flow is actually doing.
 
 Matt
 
 
 On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
 mdecou...@googlemail.com> wrote:
 
> Hi
> I think it would be better/simpler to have one “out of service”
>> concept
> to replace penalizing and yielding and when a plugin throws an
>> exception
> then the plugin is deemed out of service, for a duration and so the
> connection to that plugin is disabled for the out of service duration.
> 
> When a plugin is out of service and the connection disabled - then
> resources that it uses will be freed(yielded).
> 
> The question then is what the behaviour of the plugin before the
>>> disabled
> connection - should be.  My thought is to tend towards stability and
>>> make
> sure resources are freed, so there may need to be a “domino
>>

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Ricky Saltzer
That's a good point, Mark. I also agree that it's better to give the user
control whenever possible. I imagine the RouteOnAttribute pattern to
eventually "give up" on a FlowFile will be a common pattern, and so so we
should account for that, rather than forcing the user into knowing this
pattern.

On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne  wrote:

>
> The retry idea concerns me a bit. If we were to have a method like:
>
> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship
> relationship)
>
> I think that leaves out some info - even if a FlowFile is
> penalized, it must be penalized and sent somewhere. So there would have to
> be
> a relationship to send it to if penalized and another to send it to if not
> penalizing.
> This also I think puts more onus on the developer to understand how it
> would be
> used - I believe the user should be making decisions about how many times
> to
> penalize, not the developer.
>
> > On Jan 28, 2016, at 2:03 PM, Bryan Bende  wrote:
> >
> > Regarding throwing an exception... I believe if you are extending
> > AbstractProcessor and an exception is thrown out of onTrigger() then the
> > session is rolled back and any flow files that were accessed are
> penalized,
> > which results in leaving them in the incoming connection to the processor
> > and not being retried until the penalty duration passes. This seems
> similar
> > to what Michael described, although it is not stopping the processor from
> > processing other incoming  flow files.
> >
> > Ricky's retry idea sounds interesting... I think a lot of people handle
> > this today by creating a retry loop using UpdateAttribute and
> > RouteOnAttribute [1].
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2
> >
> >
> > On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer 
> wrote:
> >
> >> Is there currently a way to know how many times a FlowFile has been
> >> penalized? Do we have use cases where we want to penalize a FlowFile *n
> >> *number
> >> of times before sending it down an alternate relationship? I could
> imagine
> >> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
> >> Relationship relationship). For example, someone might want to process a
> >> FlowFile three times before giving up on it.
> >>
> >> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
> >> mdecou...@googlemail.com> wrote:
> >>
> >>> Matt thanks for your reply
> >>>
> >>> I guess what I am saying in that case - if there is an error in a
> >>> FlowFile, then the processor that detects this cannot proceed so
> instead
> >> of
> >>> calling an action to penalize the FlowFile it raises an exception
> >>> OutOFServiceException or ProcessorException.
> >>> You could have an exception cause PeanilisedFlowFileException for this
> >>> case.
> >>>
> >>> But within the processor other error causes may arise for an
> >>> OutOFServiceException
> >>>
> >>> The point is that if the processor threw this exception then there can
> be
> >>> a duration configuration - a time limit to keep this processor out of
> >>> service and the connection to it and possibly any processors leading
> upto
> >>> it - Naturally this will need to be indicated on the DFM - this will
> free
> >>> resources and make the flow well behaved.
> >>>
> >>> Environmental failures will simply be a different category/cause of
> error
> >>> that can be wrapped/captured also with a more general one
> >>>
> >>> With Kind Regards
> >>> Michael de Courci
> >>> mdecou...@gmail.com
> >>>
> >>>
> >>>
> >>>
>  On 28 Jan 2016, at 17:16, Matt Gilman 
> wrote:
> 
>  Just to recap/level set...
> 
>  The distinct between yielding and penalization is important.
> >> Penalization
>  is an action taken on a FlowFile because the FlowFile cannot be
> >> processed
>  right now (like a naming conflict for instance). The Processor is
>  indicating that it cannot process that specific FlowFile at the moment
> >>> but
>  may be able to process the next. Yielding is an indication that the
>  Processor is unable to work at all at the moment likely due to an
>  environmental issue (like the out of service comment).
> 
>  If the concept of penalization were moved to a connection, does it
>  automatically penalize all FlowFile transferred to it? We would lose
> >> some
>  granularity if a Processor wanted to penalize some FlowFile routed to
> a
>  given Relationship but not others. I'm not sure if this is done in
> >>> practice
>  or not, just wanted to mention it.
> 
>  Outside of this minor concern, I like the idea. I especially like that
> >> it
>  would help with the consistency of Processor behavior and transparency
>  about what the data flow is actually doing.
> 
>  Matt
> 
> 
>  On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
>  mdecou...@googlemail.com> wrote:
> 

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Mark Payne
I think for the particular pattern, I would like to see a LoopFlowFile 
processor (or something with a better name perhaps :) )
that would allow the user to just set a threshold for how many times to try or 
how long to keep trying or both and then
send to either a 'threshold exceeded' or 'below threshold' relationship. I.e., 
set a threshold of 3 times or 10 minutes and
then route to one or the other. It would make that pattern a lot easier by just 
using a single easy-to-understand Processor.



> On Jan 28, 2016, at 2:31 PM, Ricky Saltzer  wrote:
> 
> That's a good point, Mark. I also agree that it's better to give the user
> control whenever possible. I imagine the RouteOnAttribute pattern to
> eventually "give up" on a FlowFile will be a common pattern, and so so we
> should account for that, rather than forcing the user into knowing this
> pattern.
> 
> On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne  wrote:
> 
>> 
>> The retry idea concerns me a bit. If we were to have a method like:
>> 
>> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship
>> relationship)
>> 
>> I think that leaves out some info - even if a FlowFile is
>> penalized, it must be penalized and sent somewhere. So there would have to
>> be
>> a relationship to send it to if penalized and another to send it to if not
>> penalizing.
>> This also I think puts more onus on the developer to understand how it
>> would be
>> used - I believe the user should be making decisions about how many times
>> to
>> penalize, not the developer.
>> 
>>> On Jan 28, 2016, at 2:03 PM, Bryan Bende  wrote:
>>> 
>>> Regarding throwing an exception... I believe if you are extending
>>> AbstractProcessor and an exception is thrown out of onTrigger() then the
>>> session is rolled back and any flow files that were accessed are
>> penalized,
>>> which results in leaving them in the incoming connection to the processor
>>> and not being retried until the penalty duration passes. This seems
>> similar
>>> to what Michael described, although it is not stopping the processor from
>>> processing other incoming  flow files.
>>> 
>>> Ricky's retry idea sounds interesting... I think a lot of people handle
>>> this today by creating a retry loop using UpdateAttribute and
>>> RouteOnAttribute [1].
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2
>>> 
>>> 
>>> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer 
>> wrote:
>>> 
 Is there currently a way to know how many times a FlowFile has been
 penalized? Do we have use cases where we want to penalize a FlowFile *n
 *number
 of times before sending it down an alternate relationship? I could
>> imagine
 an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
 Relationship relationship). For example, someone might want to process a
 FlowFile three times before giving up on it.
 
 On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
 mdecou...@googlemail.com> wrote:
 
> Matt thanks for your reply
> 
> I guess what I am saying in that case - if there is an error in a
> FlowFile, then the processor that detects this cannot proceed so
>> instead
 of
> calling an action to penalize the FlowFile it raises an exception
> OutOFServiceException or ProcessorException.
> You could have an exception cause PeanilisedFlowFileException for this
> case.
> 
> But within the processor other error causes may arise for an
> OutOFServiceException
> 
> The point is that if the processor threw this exception then there can
>> be
> a duration configuration - a time limit to keep this processor out of
> service and the connection to it and possibly any processors leading
>> upto
> it - Naturally this will need to be indicated on the DFM - this will
>> free
> resources and make the flow well behaved.
> 
> Environmental failures will simply be a different category/cause of
>> error
> that can be wrapped/captured also with a more general one
> 
> With Kind Regards
> Michael de Courci
> mdecou...@gmail.com
> 
> 
> 
> 
>> On 28 Jan 2016, at 17:16, Matt Gilman 
>> wrote:
>> 
>> Just to recap/level set...
>> 
>> The distinct between yielding and penalization is important.
 Penalization
>> is an action taken on a FlowFile because the FlowFile cannot be
 processed
>> right now (like a naming conflict for instance). The Processor is
>> indicating that it cannot process that specific FlowFile at the moment
> but
>> may be able to process the next. Yielding is an indication that the
>> Processor is unable to work at all at the moment likely due to an
>> environmental issue (like the out of service comment).
>> 
>> If the concept of penalization were moved to a connection, does it
>> automatically penalize all

Re: Are we thinking about Penalization all wrong?

2016-01-28 Thread Adam Taft
If we're willing to have a LoopFlowFile processor, why not consider a
PenalizeFlowFile processor too?  Just throwing it out for discussions sake,
but penalization could ultimately be realized in multiple ways:

a) by both the processor developer (and DFM via penalty duration), as it is
done today;
b) by the DFM as part of the Connection Settings, per Mark's proposal;
c) by the DFM as part of a (alternative) standard processor, with various
(and future) penalization options configured as Processor Properties
d) all the above

The line is blurry between what functionality can/should go into the
connection or queue vs. which functionality can/should go into a
processor.  If we're willing to say that LoopFlowFile should be defined as
a processor, I don't see much difference between "loop for 10 minutes" vs.
"penalize for 10 minutes" (beyond the obvious).

As a general statement, I think it's good to minimize the various ways in
which processors, queues and relationships are managed.  Today we have
configuration options for:

 - queues (expiration, prioritizers, back pressure)
-  settings tab of every processor (scheduling strategy, penalty duration,
run schedule)
-  specific settings to the processor itself (processor properties).

Flowfile expiration is handled on the queue, while penalization is
configured on the processor.  At the end of the day, whatever reduces the
number places a DFM has to touch is a good thing.

Perhaps as a radical proposal, why don't we add some @Experimental
processors which do things like PenalizeFlowFile, LoopFlowfile,
DelayFlowFile, PrioritizeFlow, and see what the experience is like using
these vs. using the existing functions.  If the community thinks there's
too much overlap, we can remove these from the 1.0 release.  But at least
we'll get some A/B testing by having these queue management services
realized as processors vs. built into other object types.  Maybe put these
into a nifi-queue-management.nar extension for people to play with?

Just food for thought.

Adam



On Thu, Jan 28, 2016 at 2:36 PM, Mark Payne  wrote:

> I think for the particular pattern, I would like to see a LoopFlowFile
> processor (or something with a better name perhaps :) )
> that would allow the user to just set a threshold for how many times to
> try or how long to keep trying or both and then
> send to either a 'threshold exceeded' or 'below threshold' relationship.
> I.e., set a threshold of 3 times or 10 minutes and
> then route to one or the other. It would make that pattern a lot easier by
> just using a single easy-to-understand Processor.
>
>
>
> > On Jan 28, 2016, at 2:31 PM, Ricky Saltzer  wrote:
> >
> > That's a good point, Mark. I also agree that it's better to give the user
> > control whenever possible. I imagine the RouteOnAttribute pattern to
> > eventually "give up" on a FlowFile will be a common pattern, and so so we
> > should account for that, rather than forcing the user into knowing this
> > pattern.
> >
> > On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne 
> wrote:
> >
> >>
> >> The retry idea concerns me a bit. If we were to have a method like:
> >>
> >> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship
> >> relationship)
> >>
> >> I think that leaves out some info - even if a FlowFile is
> >> penalized, it must be penalized and sent somewhere. So there would have
> to
> >> be
> >> a relationship to send it to if penalized and another to send it to if
> not
> >> penalizing.
> >> This also I think puts more onus on the developer to understand how it
> >> would be
> >> used - I believe the user should be making decisions about how many
> times
> >> to
> >> penalize, not the developer.
> >>
> >>> On Jan 28, 2016, at 2:03 PM, Bryan Bende  wrote:
> >>>
> >>> Regarding throwing an exception... I believe if you are extending
> >>> AbstractProcessor and an exception is thrown out of onTrigger() then
> the
> >>> session is rolled back and any flow files that were accessed are
> >> penalized,
> >>> which results in leaving them in the incoming connection to the
> processor
> >>> and not being retried until the penalty duration passes. This seems
> >> similar
> >>> to what Michael described, although it is not stopping the processor
> from
> >>> processing other incoming  flow files.
> >>>
> >>> Ricky's retry idea sounds interesting... I think a lot of people handle
> >>> this today by creating a retry loop using UpdateAttribute and
> >>> RouteOnAttribute [1].
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2
> >>>
> >>>
> >>> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer 
> >> wrote:
> >>>
>  Is there currently a way to know how many times a FlowFile has been
>  penalized? Do we have use cases where we want to penalize a FlowFile
> *n
>  *number
>  of times before sending it down an alternate relationship? I could
> >> imagine
>  an API