Hi all,

Just want to check on this thread.

I think FLIP-545 is important work. The new async dispatcher will help
Flink stability a lot when we use custom reporters - it helps unblock the
JobManager.

Kartikey, I have a quick question on the circuit breaker logic. Is the
state managed per-reporter (so each reporter is isolated), or will one
faulty reporter potentially stop dispatches for all reporters? This is a
key detail for our setup.

Ready to see the [VOTE] start soon. Thank you for the FLIP.

Thanks,
Li

On Wed, Oct 1, 2025 at 12:32 PM Kartikey Pant <[email protected]>
wrote:

> Hi all,
>
> Circling back on this thread.
>
> Thanks to the great feedback from the earlier discussion, the proposal has
> been updated to use a more flexible, interface-based design. The final FLIP
> is available on the Cwiki [1] (thanks, Piotr, for creating the page).
>
> My intention is to move this to a formal vote next week.
>
> Before I do, please raise any blocking concerns by this Friday, October
> 3rd. If there are no blocking issues, I will start the [VOTE] thread on
> Monday.
>
> Thanks,
> Kartikey
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-545%3A+Hardening+the+Event+Reporter+with+an+Asynchronous+Core
>
>
> On Tue, Sep 2, 2025 at 5:00 PM Piotr Nowojski <[email protected]>
> wrote:
>
> > Hi,
> >
> > Here you go:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-545%3A+Hardening+the+Event+Reporter+with+an+Asynchronous+Core
> >
> > Best,
> > Piotrek
> >
> > pon., 1 wrz 2025 o 19:37 Kartikey Pant <[email protected]>
> > napisał(a):
> >
> > > Hi all,
> > >
> > > Thanks, Aleksandr, for the great suggestion on using an
> > > interface-based strategy. It's a much cleaner approach that ensures
> > > backward compatibility while keeping the design extensible.
> > >
> > > Based on this feedback, I've updated the FLIP document. The design now
> > > uses an EventDispatcher interface, controlled by a single
> > > events.dispatcher.type config key, allowing users to opt-in to the new
> > > asynchronous behavior.
> > >
> > > I believe the proposal has now stabilized. As I don't have Confluence
> > > write access, could a committer please help assign an official FLIP
> > > number this:
> > >
> >
> https://docs.google.com/document/d/1CCu7Js0ATOAgqRMS-kWj_0v0G_jt2r9IfMB2Oty7KJo/edit?tab=t.0
> > >
> > > Best,
> > > Kartikey Pant
> > >
> > >
> > > On Tue, Aug 26, 2025 at 11:13 PM Aleksandr Iushmanov
> > > <[email protected]> wrote:
> > > >
> > > > Hi Kartikey,
> > > >
> > > > Thank you for looking into this.
> > > >
> > > > I might not be very familiar with the naming conventions in Flink,
> > > > so please bear with me if my suggestion doesn't make complete sense.
> > > > I suggest introducing a feature flag, something like:
> > > >
> > > > > events.reporter.<name>.dispatcher.type
> > > >
> > > > which would default to *sync* to make this change backwards
> compatible.
> > > >
> > > > Also, are there any reasons why we would not want to introduce an
> > > > interface with two implementations?
> > > > 1. sync: for the existing behaviour.
> > > > 2. memory-queue: for the proposed implementation with the queue.
> > > >
> > > > This way:
> > > >
> > > >    - we don't break anything by default
> > > >    - we can change the default in future releases once it has been
> > proven
> > > >    to be stable
> > > >    - we keep the door open for other implementations (e.g. file-based
> > > queue
> > > >    or spillover to logs).
> > > >
> > > >
> > > > I look forward to hearing your thoughts on it.
> > > >
> > > > Kind regards,
> > > > Aleksandr Iushmanov
> > > >
> > > >
> > > > On Fri, 22 Aug 2025 at 09:54, Kartikey Pant <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Aleksandr,
> > > > >
> > > > > Thanks for the great feedback. Your points on guaranteed delivery
> and
> > > the
> > > > > *FileEventsReporter* are spot on, and I agree with your reasoning.
> > I'll
> > > > > update the FLIP to incorporate them, as it will make the proposal
> > much
> > > > > stronger.
> > > > >
> > > > > Regarding the delivery guarantee, I'll add a new configuration key,
> > > > > *events.reporter.<name>.delivery.guarantee*, to allow a choice
> > between
> > > two
> > > > > modes. The default will be best-effort for the asynchronous,
> > > non-blocking
> > > > > dispatch. I'll also add a guaranteed mode for a synchronous,
> blocking
> > > > > dispatch that bypasses the queue, perfect for the critical
> > autoscaling
> > > use
> > > > > case you mentioned.
> > > > >
> > > > > On your question about the *FileEventsReporter*, you're right that
> a
> > > local
> > > > > file append is cheap. The async core isn't really designed for the
> > > > > *FileEventsReporter* specifically, but for the general case where
> > > reporters
> > > > > write to network sinks (e.g., *OpenTelemetry*) where latency and
> > > > > backpressure are real concerns. The file reporter is just meant to
> > be a
> > > > > simple, built-in option for users.
> > > > >
> > > > > I'll get these changes into the design doc shortly and will follow
> up
> > > on
> > > > > this thread once it's updated. Thanks again for helping improve the
> > > FLIP.
> > > > >
> > > > > Best,
> > > > > Kartikey
> > > > >
> > > > > On Thu, Aug 21, 2025 at 11:19 PM Aleksandr Iushmanov <
> > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Kartikey,
> > > > > >
> > > > > > I like the idea and I agree with general direction, thank you for
> > > > > > putting it together!
> > > > > >
> > > > > > I have one concern about making this modification "forced", imho
> > > there
> > > > > > should be a room for "guaranteed important events delivery" from
> > the
> > > > > > operations point of view. If Flink job is
> struggling/backpressured
> > it
> > > > > > may make sense to emit some events at priority that would be used
> > for
> > > > > > external triggers like "autoscaling" or external dynamic
> > > configuration
> > > > > > tuning.
> > > > > >
> > > > > > Imho, interfaces should either allow to choose "sync" vs "non
> > > guaranteed
> > > > > > async" delivery for different events (or event reporters). With
> > > proposal
> > > > > > "as is" it won't be possible to "ensure" that important messages
> > have
> > > > > > been delivered and can be actioned by external monitoring system.
> > > Could
> > > > > > we make "queue / async" behaviour opt-in?
> > > > > > Second question I had was around FileEventReporter
> implementation,
> > > at a
> > > > > > glance, "append to file" is a fairly cheap operation, do you
> have a
> > > > > > concern that amount of events is large enough to have significant
> > > > > > bottleneck on disk IO and requires memory queue?
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Aleksandr Iushmanov
> > > > > >
> > > > > >
> > > > > > On 2025/08/19 06:56:36 Kartikey Pant wrote:
> > > > > >  > Hi everyone,
> > > > > >  >
> > > > > >  > I'd like to propose a new FLIP that builds directly on the
> > > excellent
> > > > > >  > foundation laid by FLIP-481 (Introduce Event Reporting). For
> > > anyone
> > > > > >  > needing context, the original proposal is available here:
> > > > > >  >
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-481%3A+Introduce+Event+Reporting
> > > > > >  >
> > > > > >  > Now that the community has this powerful API, the logical next
> > > step is
> > > > > >  > to ensure it's fully robust for large-scale production
> > > environments
> > > > > >  > where users will be writing their own diverse, custom
> reporters.
> > > > > >  >
> > > > > >  > This proposal focuses on one key enhancement: introducing a
> > > resilient,
> > > > > >  > asynchronous dispatch core. The goal is to decouple event
> > > generation
> > > > > >  > from the reporter's execution, ensuring that a slow or
> > > experimental
> > > > > >  > sink can never impact Flink's core stability.
> > > > > >  >
> > > > > >  > I've drafted a detailed design document that I hope can form
> the
> > > basis
> > > > > >  > of this new FLIP:
> > > > > >  >
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1CCu7Js0ATOAgqRMS-kWj_0v0G_jt2r9IfMB2Oty7KJo/edit?usp=sharing
> > > > > >  >
> > > > > >  > I'm keen to get the community's initial feedback on this
> > direction
> > > > > >  > before moving forward with the formal process.
> > > > > >  >
> > > > > >  > Thanks,
> > > > > >  > Kartikey Pant
> > > > > >  >
> > > > > >
> > > > >
> > >
> >
>

Reply via email to