Hi all, Just want to check on this thread.
I think FLIP-545 is important work. The new async dispatcher will help Flink stability a lot when we use custom reporters - it helps unblock the JobManager. Kartikey, I have a quick question on the circuit breaker logic. Is the state managed per-reporter (so each reporter is isolated), or will one faulty reporter potentially stop dispatches for all reporters? This is a key detail for our setup. Ready to see the [VOTE] start soon. Thank you for the FLIP. Thanks, Li On Wed, Oct 1, 2025 at 12:32 PM Kartikey Pant <[email protected]> wrote: > Hi all, > > Circling back on this thread. > > Thanks to the great feedback from the earlier discussion, the proposal has > been updated to use a more flexible, interface-based design. The final FLIP > is available on the Cwiki [1] (thanks, Piotr, for creating the page). > > My intention is to move this to a formal vote next week. > > Before I do, please raise any blocking concerns by this Friday, October > 3rd. If there are no blocking issues, I will start the [VOTE] thread on > Monday. > > Thanks, > Kartikey > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-545%3A+Hardening+the+Event+Reporter+with+an+Asynchronous+Core > > > On Tue, Sep 2, 2025 at 5:00 PM Piotr Nowojski <[email protected]> > wrote: > > > Hi, > > > > Here you go: > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-545%3A+Hardening+the+Event+Reporter+with+an+Asynchronous+Core > > > > Best, > > Piotrek > > > > pon., 1 wrz 2025 o 19:37 Kartikey Pant <[email protected]> > > napisał(a): > > > > > Hi all, > > > > > > Thanks, Aleksandr, for the great suggestion on using an > > > interface-based strategy. It's a much cleaner approach that ensures > > > backward compatibility while keeping the design extensible. > > > > > > Based on this feedback, I've updated the FLIP document. The design now > > > uses an EventDispatcher interface, controlled by a single > > > events.dispatcher.type config key, allowing users to opt-in to the new > > > asynchronous behavior. > > > > > > I believe the proposal has now stabilized. As I don't have Confluence > > > write access, could a committer please help assign an official FLIP > > > number this: > > > > > > https://docs.google.com/document/d/1CCu7Js0ATOAgqRMS-kWj_0v0G_jt2r9IfMB2Oty7KJo/edit?tab=t.0 > > > > > > Best, > > > Kartikey Pant > > > > > > > > > On Tue, Aug 26, 2025 at 11:13 PM Aleksandr Iushmanov > > > <[email protected]> wrote: > > > > > > > > Hi Kartikey, > > > > > > > > Thank you for looking into this. > > > > > > > > I might not be very familiar with the naming conventions in Flink, > > > > so please bear with me if my suggestion doesn't make complete sense. > > > > I suggest introducing a feature flag, something like: > > > > > > > > > events.reporter.<name>.dispatcher.type > > > > > > > > which would default to *sync* to make this change backwards > compatible. > > > > > > > > Also, are there any reasons why we would not want to introduce an > > > > interface with two implementations? > > > > 1. sync: for the existing behaviour. > > > > 2. memory-queue: for the proposed implementation with the queue. > > > > > > > > This way: > > > > > > > > - we don't break anything by default > > > > - we can change the default in future releases once it has been > > proven > > > > to be stable > > > > - we keep the door open for other implementations (e.g. file-based > > > queue > > > > or spillover to logs). > > > > > > > > > > > > I look forward to hearing your thoughts on it. > > > > > > > > Kind regards, > > > > Aleksandr Iushmanov > > > > > > > > > > > > On Fri, 22 Aug 2025 at 09:54, Kartikey Pant < > > [email protected]> > > > > wrote: > > > > > > > > > Hi Aleksandr, > > > > > > > > > > Thanks for the great feedback. Your points on guaranteed delivery > and > > > the > > > > > *FileEventsReporter* are spot on, and I agree with your reasoning. > > I'll > > > > > update the FLIP to incorporate them, as it will make the proposal > > much > > > > > stronger. > > > > > > > > > > Regarding the delivery guarantee, I'll add a new configuration key, > > > > > *events.reporter.<name>.delivery.guarantee*, to allow a choice > > between > > > two > > > > > modes. The default will be best-effort for the asynchronous, > > > non-blocking > > > > > dispatch. I'll also add a guaranteed mode for a synchronous, > blocking > > > > > dispatch that bypasses the queue, perfect for the critical > > autoscaling > > > use > > > > > case you mentioned. > > > > > > > > > > On your question about the *FileEventsReporter*, you're right that > a > > > local > > > > > file append is cheap. The async core isn't really designed for the > > > > > *FileEventsReporter* specifically, but for the general case where > > > reporters > > > > > write to network sinks (e.g., *OpenTelemetry*) where latency and > > > > > backpressure are real concerns. The file reporter is just meant to > > be a > > > > > simple, built-in option for users. > > > > > > > > > > I'll get these changes into the design doc shortly and will follow > up > > > on > > > > > this thread once it's updated. Thanks again for helping improve the > > > FLIP. > > > > > > > > > > Best, > > > > > Kartikey > > > > > > > > > > On Thu, Aug 21, 2025 at 11:19 PM Aleksandr Iushmanov < > > > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi Kartikey, > > > > > > > > > > > > I like the idea and I agree with general direction, thank you for > > > > > > putting it together! > > > > > > > > > > > > I have one concern about making this modification "forced", imho > > > there > > > > > > should be a room for "guaranteed important events delivery" from > > the > > > > > > operations point of view. If Flink job is > struggling/backpressured > > it > > > > > > may make sense to emit some events at priority that would be used > > for > > > > > > external triggers like "autoscaling" or external dynamic > > > configuration > > > > > > tuning. > > > > > > > > > > > > Imho, interfaces should either allow to choose "sync" vs "non > > > guaranteed > > > > > > async" delivery for different events (or event reporters). With > > > proposal > > > > > > "as is" it won't be possible to "ensure" that important messages > > have > > > > > > been delivered and can be actioned by external monitoring system. > > > Could > > > > > > we make "queue / async" behaviour opt-in? > > > > > > Second question I had was around FileEventReporter > implementation, > > > at a > > > > > > glance, "append to file" is a fairly cheap operation, do you > have a > > > > > > concern that amount of events is large enough to have significant > > > > > > bottleneck on disk IO and requires memory queue? > > > > > > > > > > > > Kind regards, > > > > > > > > > > > > Aleksandr Iushmanov > > > > > > > > > > > > > > > > > > On 2025/08/19 06:56:36 Kartikey Pant wrote: > > > > > > > Hi everyone, > > > > > > > > > > > > > > I'd like to propose a new FLIP that builds directly on the > > > excellent > > > > > > > foundation laid by FLIP-481 (Introduce Event Reporting). For > > > anyone > > > > > > > needing context, the original proposal is available here: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-481%3A+Introduce+Event+Reporting > > > > > > > > > > > > > > Now that the community has this powerful API, the logical next > > > step is > > > > > > > to ensure it's fully robust for large-scale production > > > environments > > > > > > > where users will be writing their own diverse, custom > reporters. > > > > > > > > > > > > > > This proposal focuses on one key enhancement: introducing a > > > resilient, > > > > > > > asynchronous dispatch core. The goal is to decouple event > > > generation > > > > > > > from the reporter's execution, ensuring that a slow or > > > experimental > > > > > > > sink can never impact Flink's core stability. > > > > > > > > > > > > > > I've drafted a detailed design document that I hope can form > the > > > basis > > > > > > > of this new FLIP: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1CCu7Js0ATOAgqRMS-kWj_0v0G_jt2r9IfMB2Oty7KJo/edit?usp=sharing > > > > > > > > > > > > > > I'm keen to get the community's initial feedback on this > > direction > > > > > > > before moving forward with the formal process. > > > > > > > > > > > > > > Thanks, > > > > > > > Kartikey Pant > > > > > > > > > > > > > > > > > > > > > > > >
