Dear Flink Community, I am writing to propose the integration of OpenTelemetry into Apache Flink. As we all know, observability is crucial for ensuring the reliability and performance of applications. OpenTelemetry provides a comprehensive, vendor-neutral, open-source way to gather telemetry data, including metrics, traces, and logs.
OpenTelemetry has gained significant traction in the cloud-native and open-source communities and is widely adopted by popular projects such as Istio, Jaeger, and Kubernetes. Integrating it into Apache Flink will allow us to take advantage of its rich features and easy integration with existing observability tools to improve the observability of Flink applications. However, integrating OpenTelemetry into Apache Flink may also involve significant changes. We must thoroughly and openly discuss this proposal's potential benefits, challenges, and trade-offs to reach a consensus on the best way forward. Here are some of the questions that we need to consider: - What are the benefits of using OpenTelemetry in Apache Flink, and how will it improve the observability of Flink applications? - What are the potential challenges and trade-offs of integrating OpenTelemetry into Apache Flink, and how can we mitigate them? - How can we ensure a smooth and seamless transition for existing Flink users and observability tools during the integration process? - What are the steps and timeline for integrating OpenTelemetry into Apache Flink, and what is the expected impact on the development and maintenance of the Flink codebase? - Will the integration of OpenTelemetry alter the behaviour of features or components in a way that may break previous users' programs and setups? If yes, is this change desirable? - Is the integration conceptually a good fit for Flink? Will it complicate the typical case or bloat the abstractions/APIs? - Does the integration fit well into Flink's architecture, and will it scale and keep Flink flexible for the future? Do you think this is a significant new addition to Flink, and will the community commit to maintaining it? Does the integration align well with Flink's roadmap and ongoing efforts? - Does the integration produce added value for Flink users or developers, or does it introduce the risk of regression without adding relevant user or developer benefits? - Could the integration be done in another repository? I encourage everyone in the Flink community to participate in this discussion and share their thoughts and opinions. Let's work together to make Apache Flink an even better and more observable big data platform. Best regards, John