Devin, So the session follows the unit of work pattern that Martin Fowler describes [1]. The idea here is that whether you access one flow file or many flow files or anything in between you are doing some session of work. You can commit all the things you do on that session or rollback all the things you do on that session.
This pattern/concept provides a nice and safe environment in which the contract between the developer and framework is well understood. Generally, rollback is only necessary when some unplanned or unexpected exception occurs and it is largely there so that the framework can ensure all things are returned to a safe state. It can also do things like penalize that processor/extension so that if it is some programming error that it will reduce its impact on the system overall. So, with that said there are two ways to think about your case. It appears you are doing your own batching and probably this is for higher throughput and also it appears you'd really like to treat each flow file independently in terms of logic/handling. This is precisely why in addition to this nice clean unit of work pattern we also support automated session batching (this is what Andrew was referring to). In this mode you can add an annotation to your processor called @SupportsBatching which signals to the framework that it may attempt to automatically combine subsequent calls to commit into small batches and commit them in a single batch. In this way you can build your processor in a very simple single flow file sort of manner and call commit. But the framework will combine a series of commits in a very small time window together to get higher throughput. In the UI a user can signal their willingness to let the framework to do this and acknowledge that they may be trading off some small latency in favor of higher throughput. Now there are some additional things to think about when using this. For instance, it is best used when the processor and its function is side effect free meaning that there are no external system state changes or things like that. In this sense you can think of the processor you're building as idempotent (as the REST folks like to say). If your processor fits that then SupportsBatching can have really powerful results. Now, you also mention that some flow files you'd consider failures and others you'd consider something else, presumably success. This is perfect and very common and does not require a rollback. Keep rollback in mind for bad stuff that can happen that you don't plan for. For the scenario of failures that you can predict such as invalid data or invalid state of something you actually want to have a failure relationship on that processor and simply route things there. >From a 'developer' perspective this is not a rollback case. "Failure" then is as planned for and expected as "success". So you go ahead and route the flowfile to failure and call commit. All good. It is the person designing this loosely coupled and highly cohesive set of components together in a flow that gets to decide what failure means for their context. Lots of info here and probably not well written or with big gaps. You're asking the right questions so just keep asking. Lots of folks here that want to help. [1] http://martinfowler.com/eaaCatalog/unitOfWork.html [2] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#session-rollback Thanks Joe On Tue, Mar 15, 2016 at 7:25 PM, Devin Fisher <devin.fis...@perfectsearchcorp.com> wrote: > Thanks for your reply. I'm sorry if my question seems confusing. I'm still > learning how nifi works. I don't have any understand about how the > framework works on the back end and incomplete understanding of the exposed > interface. From my point view (an external process developer) asking to > rollback the one flow file that failed (I don't want changes made to it > incompletely) and lets the other n flowfiles move on seems reasonable. But > I don't know what is happening in the session on the back end. > > I likely don't really understand what happens on a rollback. Reading the > developer's guide I got the impression that rollback disregards all changes > made the session include transfers. It then returns the flowfiles to the > queue. It would seem that a session is really finished and not usable after > a rollback. So, I then don't understand how I can do my use case. I want to > rollback (undo changes to a single flow file that failed) and then transfer > it to the Failed relationship unchanged (or add the discard.reason to the > attributes). > > I assume you mean "Run duration" when you refer to the 'scheduling' tab. I > would love to understand better how that works. In the documentation, I > only see a note about it in the User guide. But the developer's guide is > silent. I don't see how that slider is enforced in the processor code. It > seems that once the framework has ceded control to the processor it can run > for as long as it wants. So more information about this would be great. > > Thanks again for the response. The information is always useful and > enlighting. > Devin > > On Tue, Mar 15, 2016 at 4:26 PM, Andrew Grande <agra...@hortonworks.com> > wrote: > >> Devin, >> >> What you're asking for is a contradicting requirement. One trades >> individual message transactional control (and necessary overhead) for the >> higher throughput with micro-batching (but lesser control). In short, you >> can't expect to rollback a message and not affect the whole batch. >> >> However, if you 'commit' this batch as received by your processor, and >> take on the responsibility of storing, tracking and commit/rollback of >> those yourself for downstream connection.... But then, why? >> >> In general, one should leverage NiFi 'Scheduling' tab and have the >> micro-batching aspect controlled via the framework. Unless you really >> really have a very good reason to do it yourself. >> >> Hope this helps, >> Andrew >> >> >> >> >> On 3/7/16, 5:00 PM, "Devin Fisher" <devin.fis...@perfectsearchcorp.com> >> wrote: >> >> >Question about rollbacks. I have a processor that is grabbing a list of >> >FlowFiles from session.get(100). It will then process each flow file one >> at >> >a time. I want to then be able if there is an error with a single >> FlowFile >> >to roll it back (and only this failed FlowFile) and transfer it to the >> >FAILED relationship. But reading the javadoc for ProcessSession I don't >> get >> >the sense that I can do that. >> > >> >Is my workflow wrong, should I only get one at a time from the session and >> >commit after each one? >> > >> >Devin >>