Hi nick,

You can look at the PR or clone my Fork and try running in your local
environment and see if there’s any way we can improve on the
auto-instrumention
Would love to get a feedback.
Thank you

On Sat, 8 Jan 2022 at 12:19 AM, <[email protected]> wrote:

> hi all, been lurking for a while - this is my first post.
>
> what I like about open telemetry is that you can send all telemetry traces
> to STDOUT (or any logs) which you can then pipe to many log forwarders of
> choice. imo this is the easiest way to set it up and a default that should
> work in the vast majority of airflow use cases.
>
> the PR looks like a great start! what can I do to help?
> ---
> nick
>
> On Jan 7, 2022, at 14:37, Elad Kalif <[email protected]> wrote:
>
> Hi Howard,
>
> We actually have outreachy intern (Melodie) that is working on
> researching how open-telemetry can be integrated with Airflow.
> Draft PR for demo : https://github.com/apache/airflow/pull/20677
> This is an initial effort for a POC.
> Maybe you can work together on this?
>
>
> On Sat, Jan 8, 2022 at 12:19 AM Howard Yoo <
> [email protected]> wrote:
>
>> Hi all,
>>
>> I’m a staff product manager in Astronomer, and wanted to post this email
>> according to the guide from
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals
>>  .
>>
>> Currently, the main method to publish telemetry data out of airflow is
>> through its statsD implementation :
>> https://github.com/apache/airflow/blob/main/airflow/stats.py , and
>> currently airflow supports two flavors of stated, the original one, and
>> data dog’s dogstatsd implementation.
>>
>> Through this implementation, we have the following list of metrics that
>> would be available for other popular monitoring tools to collect, monitor,
>> visualize, and alert on metrics generated from airflow:
>> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/metrics.html
>>
>>
>> There are a number of limitations of airflow’s current implementation of
>> its metrics using stated.
>> 1. StatsD is based on simple metrics format that does not support richer
>> contexts. Its metric name would contain some of those contexts (such as dag
>> id, task id, etc), but those can be limited due to the formatting issue of
>> having to be a part of metric name itself. A better approach would be to
>> utilizing ‘tags’ to be attached to the metrics data to add more contexts.
>> 2. StatsD also utilizes UDP as its main network protocol, but UDP
>> protocol is simple and does not guarantee the reliable transmission of the
>> payload. Moreover, many monitoring protocols are moving into more modern
>> protocols such as https to send out metrics.
>> 3. StatsD does support ‘counter,’ ‘gauge,’ and ‘timer,’ but does not
>> support distributed traces and log ingestion.
>>
>> Due to the above reasons, I have been looking at opentelemetry (
>> https://github.com/open-telemetry) as a potential replacement for
>> airflow’s current telemetry instrumentation. Opentelemetry is a product of
>> opentracing and open census, and is quickly gaining momentum in terms of
>> ‘standardization’ of means to producing and delivering telemetry data. Not
>> only metrics, but distributed traces, as well as logs. The technology is
>> also geared towards better monitoring cloud-native software. Many
>> monitoring tools vendors are supporting opentelemetry (Tanzu, Datadog,
>> Honeycomb, lightstep, etc.) and opentelemetry’s modular architecture is
>> designed to be compatible with existing legacy instrumentations. There are
>> also a stable python SDKs and APIs to easily implement it into airflow.
>>
>> Therefore, I’d like to work on proposing of improving metrics and
>> telemetry capability of airflow by adding configuration and support of open
>> telemetry so that while maintaining the backward compatibility of existing
>> stated based metrics, we would also have an opportunity to have distributed
>> traces and logs to be based on it, so that it would be easier for any
>> Opentelemetry compatible tools to be able to monitor airflow with richer
>> information.
>>
>> If you were thinking of a need to improve the current metrics
>> capabilities of airflow, and have been thinking of standards like
>> Opentelemetry, please feel free to join the thread and provide any opinions
>> or feedback. I also generally think that we may need to review our current
>> list of metrics and assess whether they are really useful in terms of
>> monitoring and observability of airflow. There are things that we might
>> want to add into metrics such as more executor related metrics, scheduler
>> related metrics, as well as operators and even DB and XCOM related metrics
>> to better assess the health of airflow and make these information helpful
>> for faster troubleshooting and problem resolution.
>>
>> Thanks and regards,
>> Howard
>>
>
>

Reply via email to