Re: Status of "event-driven" scheduling

Mark Payne Thu, 13 Sep 2018 06:51:00 -0700

Joe,

Mike is right in that it was intended to be a more efficient scheduling 
strategy. With Timer-Driven,
the processors used to constantly be checking if they had work to do and if not 
would switch contexts
and check again. And again. This was pretty expensive, so we added the 
Event-Driven strategy.

Generally, implementing the Event-Driven strategy should be fairly simple and 
straight-forward. When
a FlowFile lands in a queue, just call the onTrigger method of the queue's 
destination. However, it got
a lot more complicated when we need to consider backpressure and limiting the 
number of concurrent tasks.
So much more complicated, in fact, that tested showed that the Event-Driven 
strategy was noticeably
slower than Timer-Driven. To that end, we added the "nifi.bored.yield.duration" 
property to nifi.properties
and updated the framework so that if there is no work for the Processor to do 
(due to its queues being empty
or backpressure being applied) we don't schedule that processor thread for the 
configured number of time.
Implementing this showed a significant drop in CPU resources while still 
providing great throughput. So, truth
be told, we pretty much abandoned using Event-Driven.

I do also remember several years back, running into an issue where under high 
load we would occasionally
see a Processor "freeze up" using Event-Driven scheduling. I think that was the 
main reason we marked it
experimental. It was unclear what the cause was, but given how well the 
Timer-Driven scheduling strategy as
worked for us, I've just never re-visited it.

That being said, I do believe that an Event-Driven approach is a good idea. But 
given how much more mature NiFi
is now than it was at the point that it was implemented, I would probably 
approach the idea entirely differently.
To answer your questions directly:

1. I would never recommend using event-driven over timer-driven processors.
2. Not sure who is using it in production, but I would recommend against it.
3. My vote would be to mark it as deprecated.
4. To be honest, I'm not sure that I fully understand this question, as it is 
somewhat vague. Are you referring specifically
to scheduling, obtaining the best performance, minimizing resource utilization, 
or did you intend for this to be vague and
are just asking for any general guidance in whatever form?

Thanks
-Mark

> On Sep 12, 2018, at 5:11 PM, Michael Moser <mose...@apache.org> wrote:
> 
> Hi Joe,
> 
> I'm guessing here, but I think the Event Driven scheduling was intended to
> be more efficient than Timer Driven scheduling, in the way that push
> notifications should be more efficient than polling.  In practice, I'm not
> sure anyone has measured the difference.
> 
> I have seen folks use Event Driven scheduling to get access to the separate
> thread pool from the Timer Driven pool.  For example, if you are running on
> an 8 core system but you want a Timer Driven pool with 50 threads to do
> lots of I/O bound tasks, you might create an Event Driven pool with 4
> threads and assign your CPU heavy processing to that pool.  This limit may
> avoid having way more than 8 CPU heavy threads (from the Timer Driven pool)
> bogging down your 8 core system.
> 
> Regards,
> -- Mike
> 
> 
> On Thu, Sep 6, 2018 at 3:11 PM Joe Percivall <jperciv...@apache.org> wrote:
> 
>> Hey everyone,
>> 
>> The dataflow I'm running has one main flow and a couple other disjoint
>> process groups. Within that main flow, there are sections which aren't used
>> very often. In trying to optimize things, I looked into the guidance we
>> have on the "event-driven" scheduling type. There doesn't appear to be much
>> concrete other than "it's experimental". Which has been the go-to,
>> basically since being open-sourced.
>> 
>> So with that, I'm curious about a couple things:
>> 1: With the recent improvements to the controller and timer-based
>> scheduling, what should be our guidance on when to use event-based over
>> timer-based?
>> 2: Is anyone actually using it in production?
>> 3: Given it's been 3+ years of "it's experimental", we should start
>> thinking about either declaring it good to go or deprecating it.
>> 4: Any lessons learned on optimizing disjoint/sparse flows.
>> 
>> Cheers,
>> Joe
>> --
>> *Joe Percivall*
>> linkedin.com/in/Percivall
>> e: jperciv...@apache.com
>>

Re: Status of "event-driven" scheduling

Reply via email to