Hi all, Howard here.
Any more comments related to adding opentelemetry support for Airflow?
The feature is not just about Distributed Traces, but Metrics and Logs as well. 
I know this thread is kind of old now, but just wanted to make sure whether 
everybody has any negative opinions about going with OpenTelemetry.

Howard

On 2022/01/12 03:21:27 Howard Yoo wrote:
> I am very much interested in how we can improve 
> Not only the instrumentation by using OpenTelemetry, but also
> Think about how we can make the existing metrics list better.
> 
> For example, perhaps in the future, maybe we can provide things like how much 
> CPU, memory, and disk I/O a task instance is using, by utilizing python’s 
> plutil package as mentioned here in 
> (https://stackoverflow.com/questions/16326529/python-get-process-names-cpu-mem-usage-and-peak-mem-usage-in-windows
>  
> <https://stackoverflow.com/questions/16326529/python-get-process-names-cpu-mem-usage-and-peak-mem-usage-in-windows>),
>  because local task jobs are essentially subprocesses. By utilizing 
> OpenTelemetry, we could even collect Host metrics and platform metrics that’s 
> outside of the boundary of airflow easier - and even have them collected by 
> the OTEL collector agent at the same time.
> 
> I would be very happy if this internship project can also include
> Collecting metrics in addition to the Tracing, and think about how it can be 
> extended to cover more than what’s provided out of the box.
> 
> - Howard
> 
> On 2022/01/10 21:22:51 Jarek Potiuk wrote:
> > > Also, I do have a feedback that current metrics list and what they track 
> > > are not really that useful
> > 
> > Fully agree.
> > 
> > > (I mean, there is so much that one can do for metrics like operator 
> > > failures and ti failures - since they don’t post any context specific 
> > > information) - so while we may be working with making OpenTelemetry 
> > > available for airflow, we might also investigate and try improvements on 
> > > reviewing these metrics and really verify whether these metrics are 
> > > helpful, and if there can be additional metrics that we can instrument 
> > > while doing this.
> > 
> > Oh yeah.
> > 
> > > I think when we are designing for the distributed traces on Airflow, we 
> > > should also work on defining what kind of traces would be useful and how 
> > > to come up with better name convention etc. to make things clear and easy 
> > > to understand, etc..
> > 
> > Absolutely!  I think we have a very clear "separation" and actually
> > "complementary" work that we should indeed do together!
> > 
> > 1) From the "internship project" that we do together with Melody, the
> > focus is more on the engineering side - "how we can easily integrate
> > open-telemetry" with Airflow - seamlessly and in a modular fashion and
> > in the way that will be easy to use and test in "development
> > environment". It is more about solving all engineering obstacles with
> > integration (for example what we learn now is that Open Telemetry
> > requires some custom code to account for a "forking" model. Also about
> > exposing a lot of low-level metrics that are not airflow specific
> > (flask, db access etc - something that really allows to debug "any"
> > application deployment, not only Airflow). Then we thought about
> > simply adding the "current" metrics that we have in statsd as custom
> > ones.
> > 
> > * And I understand that your focus is - more "how we can actually make
> > a really useful set of Airflow metrics" which is ideally complementing
> > the "engineering" part - once we get OT fully integrated we can add
> > not only (or maybe even not at all) the current metrics but, once you
> > help defining "better" metrics, we can simply implement them in OT -
> > including some example dashboards etc.
> > 
> > Happy to collaborate on that!
> > 
> > J.
> > 
> > 
> > > - Howard
> > >
> > 

Reply via email to