The retry idea concerns me a bit. If we were to have a method like: penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship relationship)
I think that leaves out some info - even if a FlowFile is penalized, it must be penalized and sent somewhere. So there would have to be a relationship to send it to if penalized and another to send it to if not penalizing. This also I think puts more onus on the developer to understand how it would be used - I believe the user should be making decisions about how many times to penalize, not the developer. > On Jan 28, 2016, at 2:03 PM, Bryan Bende <bbe...@gmail.com> wrote: > > Regarding throwing an exception... I believe if you are extending > AbstractProcessor and an exception is thrown out of onTrigger() then the > session is rolled back and any flow files that were accessed are penalized, > which results in leaving them in the incoming connection to the processor > and not being retried until the penalty duration passes. This seems similar > to what Michael described, although it is not stopping the processor from > processing other incoming flow files. > > Ricky's retry idea sounds interesting... I think a lot of people handle > this today by creating a retry loop using UpdateAttribute and > RouteOnAttribute [1]. > > [1] > https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2 > > > On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com> wrote: > >> Is there currently a way to know how many times a FlowFile has been >> penalized? Do we have use cases where we want to penalize a FlowFile *n >> *number >> of times before sending it down an alternate relationship? I could imagine >> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries, >> Relationship relationship). For example, someone might want to process a >> FlowFile three times before giving up on it. >> >> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci < >> mdecou...@googlemail.com> wrote: >> >>> Matt thanks for your reply >>> >>> I guess what I am saying in that case - if there is an error in a >>> FlowFile, then the processor that detects this cannot proceed so instead >> of >>> calling an action to penalize the FlowFile it raises an exception >>> OutOFServiceException or ProcessorException. >>> You could have an exception cause PeanilisedFlowFileException for this >>> case. >>> >>> But within the processor other error causes may arise for an >>> OutOFServiceException >>> >>> The point is that if the processor threw this exception then there can be >>> a duration configuration - a time limit to keep this processor out of >>> service and the connection to it and possibly any processors leading upto >>> it - Naturally this will need to be indicated on the DFM - this will free >>> resources and make the flow well behaved. >>> >>> Environmental failures will simply be a different category/cause of error >>> that can be wrapped/captured also with a more general one >>> >>> With Kind Regards >>> Michael de Courci >>> mdecou...@gmail.com >>> >>> >>> >>> >>>> On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com> wrote: >>>> >>>> Just to recap/level set... >>>> >>>> The distinct between yielding and penalization is important. >> Penalization >>>> is an action taken on a FlowFile because the FlowFile cannot be >> processed >>>> right now (like a naming conflict for instance). The Processor is >>>> indicating that it cannot process that specific FlowFile at the moment >>> but >>>> may be able to process the next. Yielding is an indication that the >>>> Processor is unable to work at all at the moment likely due to an >>>> environmental issue (like the out of service comment). >>>> >>>> If the concept of penalization were moved to a connection, does it >>>> automatically penalize all FlowFile transferred to it? We would lose >> some >>>> granularity if a Processor wanted to penalize some FlowFile routed to a >>>> given Relationship but not others. I'm not sure if this is done in >>> practice >>>> or not, just wanted to mention it. >>>> >>>> Outside of this minor concern, I like the idea. I especially like that >> it >>>> would help with the consistency of Processor behavior and transparency >>>> about what the data flow is actually doing. >>>> >>>> Matt >>>> >>>> >>>> On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci < >>>> mdecou...@googlemail.com> wrote: >>>> >>>>> Hi >>>>> I think it would be better/simpler to have one “out of service” >> concept >>>>> to replace penalizing and yielding and when a plugin throws an >> exception >>>>> then the plugin is deemed out of service, for a duration and so the >>>>> connection to that plugin is disabled for the out of service duration. >>>>> >>>>> When a plugin is out of service and the connection disabled - then >>>>> resources that it uses will be freed(yielded). >>>>> >>>>> The question then is what the behaviour of the plugin before the >>> disabled >>>>> connection - should be. My thought is to tend towards stability and >>> make >>>>> sure resources are freed, so there may need to be a “domino >>> effect”/cascade >>>>> affect where all plugins before are gradually put out of service. >>>>> >>>>> >>>>> With Kind Regards >>>>> Michael de Courci >>>>> mdecou...@gmail.com >>>>> >>>>> >>>>> >>>>> >>>>>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com> wrote: >>>>>> >>>>>> All, >>>>>> >>>>>> I've been thinking about how we handle the concept of penalizing >>>>> FlowFiles. We've had a lot of questions >>>>>> lately about how penalization works & the concept in general. Seems >> the >>>>> following problems exist: >>>>>> >>>>>> - Confusion about difference between penalization & yielding >>>>>> - DFM sees option to configure penalization period on all processors, >>>>> even if they don't penalize FlowFiles. >>>>>> - DFM cannot set penalty duration in 1 case and set a different value >>>>> for a different case (different relationship, for example). >>>>>> - Developers often forget to call penalize() >>>>>> - Developer has to determine whether or not to penalize when >> building a >>>>> processor. It is based on what the developer will >>>>>> think may make sense, but in reality DFM's sometimes want to penalize >>>>> things when the processor doesn't behave that way. >>>>>> >>>>>> I'm wondering if it doesn't make sense to remove the concept of >>>>> penalization all together from Processors and instead >>>>>> move the Penalty Duration so that it's a setting on the Connection. I >>>>> think this would clear up the confusion and give the DFM >>>>>> more control over when/how long to penalize. Could set to the default >>> to >>>>> 30 seconds for self-looping connections and no penalization >>>>>> for other connections. >>>>>> >>>>>> Any thoughts? >>>>>> >>>>>> Thanks >>>>>> -Mark >>>>> >>>>> >>> >>> >> >> >> -- >> Ricky Saltzer >> http://www.cloudera.com >>