Dear Apache Flink Development Team,

I hope this email finds you well. I propose an exciting new feature for Apache 
Flink that has the potential to significantly enhance its capabilities in 
handling unbounded streams of events, particularly in the context of event-time 
windowing.

As you may be aware, Apache Flink has been at the forefront of Big Data Stream 
processing engines, leveraging windowing techniques to manage unbounded event 
streams effectively. The accuracy of the results obtained from these streams 
relies heavily on the ability to gather all relevant input within a window. At 
the core of this process are watermarks, which serve as unique timestamps 
marking the progression of events in time.

However, our analysis has revealed a critical issue with the current watermark 
generation method in Apache Flink. This method, which operates at the input 
stream level, exhibits a bias towards faster sub-streams, resulting in the 
unfortunate consequence of dropped events from slower sub-streams. Our 
investigations showed that Apache Flink's conventional watermark generation 
approach led to an alarming data loss of approximately 33% when 50% of the keys 
around the median experienced delays. This loss further escalated to over 37% 
when 50% of random keys were delayed.

In response to this issue, we have authored a research paper outlining a novel 
strategy named "keyed watermarks" to address data loss and substantially 
enhance data processing accuracy, achieving at least 99% accuracy in most 
scenarios.

Moreover, we have conducted comprehensive comparative studies to evaluate the 
effectiveness of our strategy against the conventional watermark generation 
method, specifically in terms of event-time tracking accuracy.

We believe that implementing keyed watermarks in Apache Flink can greatly 
enhance its performance and reliability, making it an even more valuable tool 
for organizations dealing with complex, high-throughput data processing tasks.

We kindly request your consideration of this proposal. We would be eager to 
discuss further details, provide the full research paper, or collaborate 
closely to facilitate the integration of this feature into Apache Flink.

Please check this preprint on Research Square: 
https://www.researchsquare.com/article/rs-3395909/<https://www.researchsquare.com/article/rs-3395909/v1>

Thank you for your time and attention to this proposal. We look forward to the 
opportunity to contribute to the continued success and evolution of Apache 
Flink.

Best Regards,

Tawfik Yasser
Senior Teaching Assistant @ Nile University, Egypt
Email: tyas...@nu.edu.eg
LinkedIn: https://www.linkedin.com/in/tawfikyasser/

Reply via email to