I think for the particular pattern, I would like to see a LoopFlowFile processor (or something with a better name perhaps :) ) that would allow the user to just set a threshold for how many times to try or how long to keep trying or both and then send to either a 'threshold exceeded' or 'below threshold' relationship. I.e., set a threshold of 3 times or 10 minutes and then route to one or the other. It would make that pattern a lot easier by just using a single easy-to-understand Processor.
> On Jan 28, 2016, at 2:31 PM, Ricky Saltzer <ri...@cloudera.com> wrote: > > That's a good point, Mark. I also agree that it's better to give the user > control whenever possible. I imagine the RouteOnAttribute pattern to > eventually "give up" on a FlowFile will be a common pattern, and so so we > should account for that, rather than forcing the user into knowing this > pattern. > > On Thu, Jan 28, 2016 at 2:11 PM, Mark Payne <marka...@hotmail.com> wrote: > >> >> The retry idea concerns me a bit. If we were to have a method like: >> >> penalizeOrTransfer(FlowFile flowFile, int numberOfTries, Relationship >> relationship) >> >> I think that leaves out some info - even if a FlowFile is >> penalized, it must be penalized and sent somewhere. So there would have to >> be >> a relationship to send it to if penalized and another to send it to if not >> penalizing. >> This also I think puts more onus on the developer to understand how it >> would be >> used - I believe the user should be making decisions about how many times >> to >> penalize, not the developer. >> >>> On Jan 28, 2016, at 2:03 PM, Bryan Bende <bbe...@gmail.com> wrote: >>> >>> Regarding throwing an exception... I believe if you are extending >>> AbstractProcessor and an exception is thrown out of onTrigger() then the >>> session is rolled back and any flow files that were accessed are >> penalized, >>> which results in leaving them in the incoming connection to the processor >>> and not being retried until the penalty duration passes. This seems >> similar >>> to what Michael described, although it is not stopping the processor from >>> processing other incoming flow files. >>> >>> Ricky's retry idea sounds interesting... I think a lot of people handle >>> this today by creating a retry loop using UpdateAttribute and >>> RouteOnAttribute [1]. >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/download/attachments/57904847/Retry_Count_Loop.xml?version=1&modificationDate=1433271239000&api=v2 >>> >>> >>> On Thu, Jan 28, 2016 at 1:24 PM, Ricky Saltzer <ri...@cloudera.com> >> wrote: >>> >>>> Is there currently a way to know how many times a FlowFile has been >>>> penalized? Do we have use cases where we want to penalize a FlowFile *n >>>> *number >>>> of times before sending it down an alternate relationship? I could >> imagine >>>> an API like penalizeOrTransfer(FlowFile flowFile, int numberOfTries, >>>> Relationship relationship). For example, someone might want to process a >>>> FlowFile three times before giving up on it. >>>> >>>> On Thu, Jan 28, 2016 at 12:47 PM, Michael de Courci < >>>> mdecou...@googlemail.com> wrote: >>>> >>>>> Matt thanks for your reply >>>>> >>>>> I guess what I am saying in that case - if there is an error in a >>>>> FlowFile, then the processor that detects this cannot proceed so >> instead >>>> of >>>>> calling an action to penalize the FlowFile it raises an exception >>>>> OutOFServiceException or ProcessorException. >>>>> You could have an exception cause PeanilisedFlowFileException for this >>>>> case. >>>>> >>>>> But within the processor other error causes may arise for an >>>>> OutOFServiceException >>>>> >>>>> The point is that if the processor threw this exception then there can >> be >>>>> a duration configuration - a time limit to keep this processor out of >>>>> service and the connection to it and possibly any processors leading >> upto >>>>> it - Naturally this will need to be indicated on the DFM - this will >> free >>>>> resources and make the flow well behaved. >>>>> >>>>> Environmental failures will simply be a different category/cause of >> error >>>>> that can be wrapped/captured also with a more general one >>>>> >>>>> With Kind Regards >>>>> Michael de Courci >>>>> mdecou...@gmail.com >>>>> >>>>> >>>>> >>>>> >>>>>> On 28 Jan 2016, at 17:16, Matt Gilman <matt.c.gil...@gmail.com> >> wrote: >>>>>> >>>>>> Just to recap/level set... >>>>>> >>>>>> The distinct between yielding and penalization is important. >>>> Penalization >>>>>> is an action taken on a FlowFile because the FlowFile cannot be >>>> processed >>>>>> right now (like a naming conflict for instance). The Processor is >>>>>> indicating that it cannot process that specific FlowFile at the moment >>>>> but >>>>>> may be able to process the next. Yielding is an indication that the >>>>>> Processor is unable to work at all at the moment likely due to an >>>>>> environmental issue (like the out of service comment). >>>>>> >>>>>> If the concept of penalization were moved to a connection, does it >>>>>> automatically penalize all FlowFile transferred to it? We would lose >>>> some >>>>>> granularity if a Processor wanted to penalize some FlowFile routed to >> a >>>>>> given Relationship but not others. I'm not sure if this is done in >>>>> practice >>>>>> or not, just wanted to mention it. >>>>>> >>>>>> Outside of this minor concern, I like the idea. I especially like that >>>> it >>>>>> would help with the consistency of Processor behavior and transparency >>>>>> about what the data flow is actually doing. >>>>>> >>>>>> Matt >>>>>> >>>>>> >>>>>> On Thu, Jan 28, 2016 at 12:00 PM, Michael de Courci < >>>>>> mdecou...@googlemail.com> wrote: >>>>>> >>>>>>> Hi >>>>>>> I think it would be better/simpler to have one “out of service” >>>> concept >>>>>>> to replace penalizing and yielding and when a plugin throws an >>>> exception >>>>>>> then the plugin is deemed out of service, for a duration and so the >>>>>>> connection to that plugin is disabled for the out of service >> duration. >>>>>>> >>>>>>> When a plugin is out of service and the connection disabled - then >>>>>>> resources that it uses will be freed(yielded). >>>>>>> >>>>>>> The question then is what the behaviour of the plugin before the >>>>> disabled >>>>>>> connection - should be. My thought is to tend towards stability and >>>>> make >>>>>>> sure resources are freed, so there may need to be a “domino >>>>> effect”/cascade >>>>>>> affect where all plugins before are gradually put out of service. >>>>>>> >>>>>>> >>>>>>> With Kind Regards >>>>>>> Michael de Courci >>>>>>> mdecou...@gmail.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On 28 Jan 2016, at 16:34, Mark Payne <marka...@hotmail.com> wrote: >>>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> I've been thinking about how we handle the concept of penalizing >>>>>>> FlowFiles. We've had a lot of questions >>>>>>>> lately about how penalization works & the concept in general. Seems >>>> the >>>>>>> following problems exist: >>>>>>>> >>>>>>>> - Confusion about difference between penalization & yielding >>>>>>>> - DFM sees option to configure penalization period on all >> processors, >>>>>>> even if they don't penalize FlowFiles. >>>>>>>> - DFM cannot set penalty duration in 1 case and set a different >> value >>>>>>> for a different case (different relationship, for example). >>>>>>>> - Developers often forget to call penalize() >>>>>>>> - Developer has to determine whether or not to penalize when >>>> building a >>>>>>> processor. It is based on what the developer will >>>>>>>> think may make sense, but in reality DFM's sometimes want to >> penalize >>>>>>> things when the processor doesn't behave that way. >>>>>>>> >>>>>>>> I'm wondering if it doesn't make sense to remove the concept of >>>>>>> penalization all together from Processors and instead >>>>>>>> move the Penalty Duration so that it's a setting on the Connection. >> I >>>>>>> think this would clear up the confusion and give the DFM >>>>>>>> more control over when/how long to penalize. Could set to the >> default >>>>> to >>>>>>> 30 seconds for self-looping connections and no penalization >>>>>>>> for other connections. >>>>>>>> >>>>>>>> Any thoughts? >>>>>>>> >>>>>>>> Thanks >>>>>>>> -Mark >>>>>>> >>>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ricky Saltzer >>>> http://www.cloudera.com >>>> >> >> > > > -- > Ricky Saltzer > http://www.cloudera.com