Hi Tawfik,

In response to this issue, we have authored a research paper outlining a
> novel strategy named "keyed watermarks" to address data loss and
> substantially enhance data processing accuracy, achieving at least 99%
> accuracy in most scenarios.
>

Sounds like a significant improvement! Looking forward to the details of
your research.

Best,
Jane

On Thu, Sep 7, 2023 at 9:50 AM liu ron <ron9....@gmail.com> wrote:

> Hi Tawfik,
>
> Fast and slow streaming in distributed scenarios leads to watermark
> advancing too fast, which leads to lost data and is a headache in Flink.
> Can't wait to read your research paper!
>
> Best,
> Ron
>
> Yun Tang <myas...@live.com> 于2023年9月6日周三 14:46写道:
>
> > Hi Tawfik,
> >
> > Thanks for offering such a proposal, looking forward to your research
> > paper!
> >
> > You could also ask the edit permission for Flink improvement proposals to
> > create a new proposal if you want to contribute this to the community by
> > yourself.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: yuxia <luoyu...@alumni.sjtu.edu.cn>
> > Sent: Wednesday, September 6, 2023 12:31
> > To: dev <dev@flink.apache.org>
> > Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink
> >
> > Hi, Tawfik Yasser.
> > Thanks for the proposal.
> > It sounds exciting. I can't wait the research paper for more details.
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "David Morávek" <d...@apache.org>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51
> > 主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink
> >
> > Hi Tawfik,
> >
> > It's exciting to see any ongoing research that tries to push Flink
> forward!
> >
> > The get the discussion started, can you please your paper with the
> > community? Assessing the proposal without further context is tough.
> >
> > Best,
> > D.
> >
> > On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek <tyas...@nu.edu.eg>
> > wrote:
> >
> > > Dear Apache Flink Development Team,
> > >
> > > I hope this email finds you well. I am writing to propose an exciting
> new
> > > feature for Apache Flink that has the potential to significantly
> enhance
> > > its capabilities in handling unbounded streams of events, particularly
> in
> > > the context of event-time windowing.
> > >
> > > As you may be aware, Apache Flink has been at the forefront of Big Data
> > > Stream processing engines, leveraging windowing techniques to manage
> > > unbounded event streams effectively. The accuracy of the results
> obtained
> > > from these streams relies heavily on the ability to gather all relevant
> > > input within a window. At the core of this process are watermarks,
> which
> > > serve as unique timestamps marking the progression of events in time.
> > >
> > > However, our analysis has revealed a critical issue with the current
> > > watermark generation method in Apache Flink. This method, which
> operates
> > at
> > > the input stream level, exhibits a bias towards faster sub-streams,
> > > resulting in the unfortunate consequence of dropped events from slower
> > > sub-streams. Our investigations showed that Apache Flink's conventional
> > > watermark generation approach led to an alarming data loss of
> > approximately
> > > 33% when 50% of the keys around the median experienced delays. This
> loss
> > > further escalated to over 37% when 50% of random keys were delayed.
> > >
> > > In response to this issue, we have authored a research paper outlining
> a
> > > novel strategy named "keyed watermarks" to address data loss and
> > > substantially enhance data processing accuracy, achieving at least 99%
> > > accuracy in most scenarios.
> > >
> > > Moreover, we have conducted comprehensive comparative studies to
> evaluate
> > > the effectiveness of our strategy against the conventional watermark
> > > generation method, specifically in terms of event-time tracking
> accuracy.
> > >
> > > We believe that implementing keyed watermarks in Apache Flink can
> greatly
> > > enhance its performance and reliability, making it an even more
> valuable
> > > tool for organizations dealing with complex, high-throughput data
> > > processing tasks.
> > >
> > > We kindly request your consideration of this proposal. We would be
> eager
> > > to discuss further details, provide the full research paper, or
> > collaborate
> > > closely to facilitate the integration of this feature into Apache
> Flink.
> > >
> > > Thank you for your time and attention to this proposal. We look forward
> > to
> > > the opportunity to contribute to the continued success and evolution of
> > > Apache Flink.
> > >
> > > Best Regards,
> > >
> > > Tawfik Yasser
> > > Senior Teaching Assistant @ Nile University, Egypt
> > > Email: tyas...@nu.edu.eg
> > > LinkedIn: https://www.linkedin.com/in/tawfikyasser/
> > >
> >
>

Reply via email to