Hi Tawfik, In response to this issue, we have authored a research paper outlining a > novel strategy named "keyed watermarks" to address data loss and > substantially enhance data processing accuracy, achieving at least 99% > accuracy in most scenarios. >
Sounds like a significant improvement! Looking forward to the details of your research. Best, Jane On Thu, Sep 7, 2023 at 9:50 AM liu ron <ron9....@gmail.com> wrote: > Hi Tawfik, > > Fast and slow streaming in distributed scenarios leads to watermark > advancing too fast, which leads to lost data and is a headache in Flink. > Can't wait to read your research paper! > > Best, > Ron > > Yun Tang <myas...@live.com> 于2023年9月6日周三 14:46写道: > > > Hi Tawfik, > > > > Thanks for offering such a proposal, looking forward to your research > > paper! > > > > You could also ask the edit permission for Flink improvement proposals to > > create a new proposal if you want to contribute this to the community by > > yourself. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > Best > > Yun Tang > > ________________________________ > > From: yuxia <luoyu...@alumni.sjtu.edu.cn> > > Sent: Wednesday, September 6, 2023 12:31 > > To: dev <dev@flink.apache.org> > > Subject: Re: Proposal for Implementing Keyed Watermarks in Apache Flink > > > > Hi, Tawfik Yasser. > > Thanks for the proposal. > > It sounds exciting. I can't wait the research paper for more details. > > > > Best regards, > > Yuxia > > > > ----- 原始邮件 ----- > > 发件人: "David Morávek" <d...@apache.org> > > 收件人: "dev" <dev@flink.apache.org> > > 发送时间: 星期二, 2023年 9 月 05日 下午 4:36:51 > > 主题: Re: Proposal for Implementing Keyed Watermarks in Apache Flink > > > > Hi Tawfik, > > > > It's exciting to see any ongoing research that tries to push Flink > forward! > > > > The get the discussion started, can you please your paper with the > > community? Assessing the proposal without further context is tough. > > > > Best, > > D. > > > > On Mon, Sep 4, 2023 at 4:42 PM Tawfek Yasser Tawfek <tyas...@nu.edu.eg> > > wrote: > > > > > Dear Apache Flink Development Team, > > > > > > I hope this email finds you well. I am writing to propose an exciting > new > > > feature for Apache Flink that has the potential to significantly > enhance > > > its capabilities in handling unbounded streams of events, particularly > in > > > the context of event-time windowing. > > > > > > As you may be aware, Apache Flink has been at the forefront of Big Data > > > Stream processing engines, leveraging windowing techniques to manage > > > unbounded event streams effectively. The accuracy of the results > obtained > > > from these streams relies heavily on the ability to gather all relevant > > > input within a window. At the core of this process are watermarks, > which > > > serve as unique timestamps marking the progression of events in time. > > > > > > However, our analysis has revealed a critical issue with the current > > > watermark generation method in Apache Flink. This method, which > operates > > at > > > the input stream level, exhibits a bias towards faster sub-streams, > > > resulting in the unfortunate consequence of dropped events from slower > > > sub-streams. Our investigations showed that Apache Flink's conventional > > > watermark generation approach led to an alarming data loss of > > approximately > > > 33% when 50% of the keys around the median experienced delays. This > loss > > > further escalated to over 37% when 50% of random keys were delayed. > > > > > > In response to this issue, we have authored a research paper outlining > a > > > novel strategy named "keyed watermarks" to address data loss and > > > substantially enhance data processing accuracy, achieving at least 99% > > > accuracy in most scenarios. > > > > > > Moreover, we have conducted comprehensive comparative studies to > evaluate > > > the effectiveness of our strategy against the conventional watermark > > > generation method, specifically in terms of event-time tracking > accuracy. > > > > > > We believe that implementing keyed watermarks in Apache Flink can > greatly > > > enhance its performance and reliability, making it an even more > valuable > > > tool for organizations dealing with complex, high-throughput data > > > processing tasks. > > > > > > We kindly request your consideration of this proposal. We would be > eager > > > to discuss further details, provide the full research paper, or > > collaborate > > > closely to facilitate the integration of this feature into Apache > Flink. > > > > > > Thank you for your time and attention to this proposal. We look forward > > to > > > the opportunity to contribute to the continued success and evolution of > > > Apache Flink. > > > > > > Best Regards, > > > > > > Tawfik Yasser > > > Senior Teaching Assistant @ Nile University, Egypt > > > Email: tyas...@nu.edu.eg > > > LinkedIn: https://www.linkedin.com/in/tawfikyasser/ > > > > > >