Hi nick, You can look at the PR or clone my Fork and try running in your local environment and see if there’s any way we can improve on the auto-instrumention Would love to get a feedback. Thank you
On Sat, 8 Jan 2022 at 12:19 AM, <[email protected]> wrote: > hi all, been lurking for a while - this is my first post. > > what I like about open telemetry is that you can send all telemetry traces > to STDOUT (or any logs) which you can then pipe to many log forwarders of > choice. imo this is the easiest way to set it up and a default that should > work in the vast majority of airflow use cases. > > the PR looks like a great start! what can I do to help? > --- > nick > > On Jan 7, 2022, at 14:37, Elad Kalif <[email protected]> wrote: > > Hi Howard, > > We actually have outreachy intern (Melodie) that is working on > researching how open-telemetry can be integrated with Airflow. > Draft PR for demo : https://github.com/apache/airflow/pull/20677 > This is an initial effort for a POC. > Maybe you can work together on this? > > > On Sat, Jan 8, 2022 at 12:19 AM Howard Yoo < > [email protected]> wrote: > >> Hi all, >> >> I’m a staff product manager in Astronomer, and wanted to post this email >> according to the guide from >> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals >> . >> >> Currently, the main method to publish telemetry data out of airflow is >> through its statsD implementation : >> https://github.com/apache/airflow/blob/main/airflow/stats.py , and >> currently airflow supports two flavors of stated, the original one, and >> data dog’s dogstatsd implementation. >> >> Through this implementation, we have the following list of metrics that >> would be available for other popular monitoring tools to collect, monitor, >> visualize, and alert on metrics generated from airflow: >> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html >> >> >> There are a number of limitations of airflow’s current implementation of >> its metrics using stated. >> 1. StatsD is based on simple metrics format that does not support richer >> contexts. Its metric name would contain some of those contexts (such as dag >> id, task id, etc), but those can be limited due to the formatting issue of >> having to be a part of metric name itself. A better approach would be to >> utilizing ‘tags’ to be attached to the metrics data to add more contexts. >> 2. StatsD also utilizes UDP as its main network protocol, but UDP >> protocol is simple and does not guarantee the reliable transmission of the >> payload. Moreover, many monitoring protocols are moving into more modern >> protocols such as https to send out metrics. >> 3. StatsD does support ‘counter,’ ‘gauge,’ and ‘timer,’ but does not >> support distributed traces and log ingestion. >> >> Due to the above reasons, I have been looking at opentelemetry ( >> https://github.com/open-telemetry) as a potential replacement for >> airflow’s current telemetry instrumentation. Opentelemetry is a product of >> opentracing and open census, and is quickly gaining momentum in terms of >> ‘standardization’ of means to producing and delivering telemetry data. Not >> only metrics, but distributed traces, as well as logs. The technology is >> also geared towards better monitoring cloud-native software. Many >> monitoring tools vendors are supporting opentelemetry (Tanzu, Datadog, >> Honeycomb, lightstep, etc.) and opentelemetry’s modular architecture is >> designed to be compatible with existing legacy instrumentations. There are >> also a stable python SDKs and APIs to easily implement it into airflow. >> >> Therefore, I’d like to work on proposing of improving metrics and >> telemetry capability of airflow by adding configuration and support of open >> telemetry so that while maintaining the backward compatibility of existing >> stated based metrics, we would also have an opportunity to have distributed >> traces and logs to be based on it, so that it would be easier for any >> Opentelemetry compatible tools to be able to monitor airflow with richer >> information. >> >> If you were thinking of a need to improve the current metrics >> capabilities of airflow, and have been thinking of standards like >> Opentelemetry, please feel free to join the thread and provide any opinions >> or feedback. I also generally think that we may need to review our current >> list of metrics and assess whether they are really useful in terms of >> monitoring and observability of airflow. There are things that we might >> want to add into metrics such as more executor related metrics, scheduler >> related metrics, as well as operators and even DB and XCOM related metrics >> to better assess the health of airflow and make these information helpful >> for faster troubleshooting and problem resolution. >> >> Thanks and regards, >> Howard >> > >
