Thank you very much for your input Jarek.
I am responding in the comments and adding to the doc accordingly.
I would also love to hear from more stakeholders.
Thanks to all who provided feedback so far.
Julien

On Fri, Jan 27, 2023 at 12:57 AM Jarek Potiuk <[email protected]> wrote:

> General comment from my side: I think Open Lineage is (and should be
> even more) a feature of Airflow that expands Airflow's capabilities
> greatly and opens up the direction we've been all working on - Airflow
> as a Platform.
>
> I think closely integrating it with Open-Lineage goes the same
> direction (also mentioned in the doc) as Open Telemetry goes, where we
> might decide to support certain standards in order to expand
> capabilities of Airflow-as-a-platform and allows to plug-in multiple
> external solutions that would use the standard API. After Open-Lineage
> graduated recently to  LFAI&Data foundation (I've been watching this
> happening from far), it is I think the perfect candidate for Airflow
> to incorporate it. I hope this will help all the players to make use
> of the extra work necessary by the community to make it "officially
> supported". I think we have to also get some feedback from the big
> stakeholders in Airflow - because one thing is to have such a
> capability, and another is to get it used in all the ways Airflow is
> used - not only by on-premise/self-hosted users (which is obviously a
> huge driving factor) but also everywhere where Airflow is exposed by
> others - Astronomer is obviously on-board. we see some warm words from
> Amazon (mentioned by Julian), I would love to hear whether the
> Composer team at Google would be on board in using the open-lineage
> information exposed this way in their Data Catalog (and likely more)
> offering. We have Amundsen and others and possibly other stakeholders
> might want to say something.
>
>
> There is - undoubtedly - an extra effort involved in implementing and
> keeping it running smoothly (as Julian mentioned, that is the main
> reason why the Open Lineage community would like to make the
> integration part of Airflow. But by being smart and integrating it in
> the way that will allow to plug-it-in into our CI, verification
> process and making some very clear expectations about what it means
> for contributors to Airflow to get it running, we can make some
> initial investment in making it happen and minimise on-going cost,
> while maximising the gain.
>
> And looking at all the above - I am super happy to help with all that
> to make this easy to "swallow" and integrate well, even if it will
> take an extra effort, especially that we will have experts from Open
> Lineage who worked with both Airflow and Open Lineage being the core
> part of the effort. I am actually super excited - this might be the
> next-big-thing for Airflow to strengthen its position as an
> indispensable component of "even more modern data stack".
>
> I made my initial comments in the doc, and am looking forward to
> making it happen :).
>
> J.
>
> On Fri, Jan 27, 2023 at 2:20 AM Julien Le Dem
> <[email protected]> wrote:
> >
> > Dear Airflow Community,
> > I have been working on a proposal to bring an OpenLineage provider to
> Airflow.
> > I am looking for feedback with the goal to post an official AIP.
> > Please feel free to comment in the doc above.
> > Thank you,
> > Julien (OpenLineage project lead)
> >
> > For convenience, here is the rationale from the doc:
> >
> > Operational lineage collection is a common need to understand
> dependencies between data pipelines and track end-to-end provenance of
> data. It enables many use cases from ensuring reliable delivery of data
> through observability to compliance and cost management.
> >
> > Publishing operational lineage is a core Airflow capability to enable
> troubleshooting and governance.
> >
> > OpenLineage is a project part of the LFAI&Data foundation that provides
> a spec standardizing operational lineage collection and sharing across the
> data ecosystem. If it provides plugins for popular open source projects,
> its intent is very similar to OpenTelemetry (also under the Linux
> Foundation umbrella): to remain a spec for lineage exchange that projects -
> open source or proprietary - implement.
> >
> > Built-in OpenLineage support in Airflow will make it easier and more
> reliable for Airflow users to publish their operational lineage through the
> OpenLineage ecosystem.
> >
> > The current external plugin maintained in the OpenLineage project
> depends on Airflow and operators internals and gets broken when changes are
> made on those. Having a built-in integration ensures a better first class
> support to expose lineage that gets tested alongside other changes and
> therefore is more stable.
>

Reply via email to