Joe,

You bring up a great point. I realized after sending the initial e-mail that 
Processors
still would need the ability to penalize a FlowFile in case of rollback. But I 
think this should
be the only way that a Processor is able to penalize a FlowFile - to indicate 
that it will not process
the FlowFile for a while. But the processor would no longer indicate 'the next 
processor cannot
process the FlowFile for some time'



> On Jan 28, 2016, at 2:08 PM, Joe Skora <jsk...@gmail.com> wrote:
> 
> I think the penalization being on the connection makes sense, but I'm not
> sure about taking penalization away from the processor altogether.
> 
> If a processor can't get far enough to transfer a flowfile to a
> relationship, it can rollback to return the flowfile to the queue and
> optionally penalize the flowfile so it won't be immediately reprocessed.
> If I understand correctly, if the failure is transient the processor should
> rollback without a penalty, but if the problem is likely to re-occur if
> flowfile is immediately reprocessed then penalization can delay the
> flowfile for a period.  I think the transient vs. re-occurring decision
> makes sense in the processor, but the severity of the penalty if problems
> are likely to re-occur makes sense on connection for greater user control.
> 
> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com> wrote:
> 
>> Is there currently a way to know how many times a FlowFile has been
>> penalized? Do we have use cases where we want to penalize a FlowFile *n
>> *number
>> of times before sending it down an alternate relationship? I could imagine
>> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries,
>> Relationship relationship). For example, someone might want to process a
>> FlowFile three times before giving up on it.
>> 
>> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci <
>> mdecou...@googlemail.com> wrote:
>> 
>>> Matt thanks for your reply
>>> 
>>> I guess what I am saying in that case - if there is an error in a
>>> FlowFile, then the processor that detects this cannot proceed so instead
>> of
>>> calling an action to penalize the FlowFile it raises an exception
>>> OutOFServiceException or ProcessorException.
>>> You could have an exception cause PeanilisedFlowFileException for this
>>> case.
>>> 
>>> But within the processor other error causes may arise for an
>>> OutOFServiceException
>>> 
>>> The point is that if the processor threw this exception then there can be
>>> a duration configuration - a time limit to keep this processor out of
>>> service and the connection to it and possibly any processors leading upto
>>> it - Naturally this will need to be indicated on the DFM - this will free
>>> resources and make the flow well behaved.
>>> 
>>> Environmental failures will simply be a different category/cause of error
>>> that can be wrapped/captured also with a more general one
>>> 
>>> With Kind Regards
>>> Michael de Courci
>>> mdecou...@gmail.com
>>> 
>>> 
>>> 
>>> 
>>>> On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com> wrote:
>>>> 
>>>> Just to recap/level set...
>>>> 
>>>> The distinct between yielding and penalization is important.
>> Penalization
>>>> is an action taken on a FlowFile because the FlowFile cannot be
>> processed
>>>> right now (like a naming conflict for instance). The Processor is
>>>> indicating that it cannot process that specific FlowFile at the moment
>>> but
>>>> may be able to process the next. Yielding is an indication that the
>>>> Processor is unable to work at all at the moment likely due to an
>>>> environmental issue (like the out of service comment).
>>>> 
>>>> If the concept of penalization were moved to a connection, does it
>>>> automatically penalize all FlowFile transferred to it? We would lose
>> some
>>>> granularity if a Processor wanted to penalize some FlowFile routed to a
>>>> given Relationship but not others. I'm not sure if this is done in
>>> practice
>>>> or not, just wanted to mention it.
>>>> 
>>>> Outside of this minor concern, I like the idea. I especially like that
>> it
>>>> would help with the consistency of Processor behavior and transparency
>>>> about what the data flow is actually doing.
>>>> 
>>>> Matt
>>>> 
>>>> 
>>>> On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci <
>>>> mdecou...@googlemail.com> wrote:
>>>> 
>>>>> Hi
>>>>> I think it would be better/simpler to have one “out of service”
>> concept
>>>>> to replace penalizing and yielding and when a plugin throws an
>> exception
>>>>> then the plugin is deemed out of service, for a duration and so the
>>>>> connection to that plugin is disabled for the out of service duration.
>>>>> 
>>>>> When a plugin is out of service and the connection disabled - then
>>>>> resources that it uses will be freed(yielded).
>>>>> 
>>>>> The question then is what the behaviour of the plugin before the
>>> disabled
>>>>> connection - should be.  My thought is to tend towards stability and
>>> make
>>>>> sure resources are freed, so there may need to be a “domino
>>> effect”/cascade
>>>>> affect where all plugins before are gradually put out of service.
>>>>> 
>>>>> 
>>>>> With Kind Regards
>>>>> Michael de Courci
>>>>> mdecou...@gmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com> wrote:
>>>>>> 
>>>>>> All,
>>>>>> 
>>>>>> I've been thinking about how we handle the concept of penalizing
>>>>> FlowFiles. We've had a lot of questions
>>>>>> lately about how penalization works & the concept in general. Seems
>> the
>>>>> following problems exist:
>>>>>> 
>>>>>> - Confusion about difference between penalization & yielding
>>>>>> - DFM sees option to configure penalization period on all processors,
>>>>> even if they don't penalize FlowFiles.
>>>>>> - DFM cannot set penalty duration in 1 case and set a different value
>>>>> for a different case (different relationship, for example).
>>>>>> - Developers often forget to call penalize()
>>>>>> - Developer has to determine whether or not to penalize when
>> building a
>>>>> processor. It is based on what the developer will
>>>>>> think may make sense, but in reality DFM's sometimes want to penalize
>>>>> things when the processor doesn't behave that way.
>>>>>> 
>>>>>> I'm wondering if it doesn't make sense to remove the concept of
>>>>> penalization all together from Processors and instead
>>>>>> move the Penalty Duration so that it's a setting on the Connection. I
>>>>> think this would clear up the confusion and give the DFM
>>>>>> more control over when/how long to penalize. Could set to the default
>>> to
>>>>> 30 seconds for self-looping connections and no penalization
>>>>>> for other connections.
>>>>>> 
>>>>>> Any thoughts?
>>>>>> 
>>>>>> Thanks
>>>>>> -Mark
>>>>> 
>>>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Ricky Saltzer
>> http://www.cloudera.com
>> 

Reply via email to