Re: Penalizing one part of a flow over another

Mark Payne Thu, 22 Apr 2021 10:19:55 -0700

Russell,

You can’t really set a “priority” of one flow of the other. A couple of options 
that may make sense for you though:


- You can set the Run Schedule to something other than “0 sec” for processors 
in the sub-flow. Perhaps set them to “100 millis” or something like that. This 
will leave to more latency in that flow but schedule the processors less 
frequently so they won’t interfere with your main flow as much. Here, though, 
if there’s a bunch of data coming in, it could result in backpressure all the 
way back to the main flow. So you’d want to consider if FlowFile Expiration is 
appropriate. That way you’d say if data sits in this first queue for more than 
3 seconds, for instance, expire it, so that it doesn’t cause back flow. You 
could schedule just the first processor in the sub flow to run at a slower pace 
or all of them, depending on if you’re just trying to slow down the ingestion 
into the flow or all of the processing.

- Similarly, rather than mess with the Run Schedule, you could use a Control 
Rate and say that you’re only going to allow a throughput of maybe 10 MB/sec 
into the sub-flow. Again, that could cause backpressure so you’d want to 
consider FlowFile Expiration if you’d rather lose the FlowFiles than allow them 
to affect the main flow.

Hope that’s helpful!

Thanks
-Mark

> On Apr 22, 2021, at 9:44 AM, Russell Bateman <[email protected]> wrote:
> 
> I have a flow performing ETL of HL7v4 (FHIR) document on their way to 
> indexing and storage. Custom processors perform the important 
> transformations. Performance of this flow is at a premium for us. At some 
> point along the way I want to gate off copies of raw or of transformed FHIR 
> records (the flow writer's choice) to a new flow (a "subflow" of the total 
> flow) for the purpose of validating those FHIR records as an option.
> 
> The main ETL flow will thus not be interrupted. Also, its performance should 
> not be too hugely impacted by this new subflow. I have looked at priority 
> techniques discussed, but usually the discussion is geared more toward a 
> resulting order. I want to deprecate the performance of this new subflow to 
> avoid handicapping the main flow, ideally from almost shutting down the 
> subflow to allowing it equal performance with the main ETL flow.
> 
> Are there recommendations for such a thing? As I author many custom 
> processors, is there something I could be doing in my code to aid this? I 
> want rather to put the amount of crippling into the hands of my flow writers  
> a) by natural, existing configuration that's a feature of most NiFi 
> processors and/or b) surfacing programming choices as configuration in my 
> custom processor's configuration. Etc.
> 
> Any comments on this are hoped for and very welcome.
> 
> (Because I wrote so many custom processors that are crucial to my flows, I 
> chose the NiFi developer- instead of the users list.)
>

Re: Penalizing one part of a flow over another

Reply via email to