Thank you very much for your input Jarek. I am responding in the comments and adding to the doc accordingly. I would also love to hear from more stakeholders. Thanks to all who provided feedback so far. Julien
On Fri, Jan 27, 2023 at 12:57 AM Jarek Potiuk <[email protected]> wrote: > General comment from my side: I think Open Lineage is (and should be > even more) a feature of Airflow that expands Airflow's capabilities > greatly and opens up the direction we've been all working on - Airflow > as a Platform. > > I think closely integrating it with Open-Lineage goes the same > direction (also mentioned in the doc) as Open Telemetry goes, where we > might decide to support certain standards in order to expand > capabilities of Airflow-as-a-platform and allows to plug-in multiple > external solutions that would use the standard API. After Open-Lineage > graduated recently to LFAI&Data foundation (I've been watching this > happening from far), it is I think the perfect candidate for Airflow > to incorporate it. I hope this will help all the players to make use > of the extra work necessary by the community to make it "officially > supported". I think we have to also get some feedback from the big > stakeholders in Airflow - because one thing is to have such a > capability, and another is to get it used in all the ways Airflow is > used - not only by on-premise/self-hosted users (which is obviously a > huge driving factor) but also everywhere where Airflow is exposed by > others - Astronomer is obviously on-board. we see some warm words from > Amazon (mentioned by Julian), I would love to hear whether the > Composer team at Google would be on board in using the open-lineage > information exposed this way in their Data Catalog (and likely more) > offering. We have Amundsen and others and possibly other stakeholders > might want to say something. > > > There is - undoubtedly - an extra effort involved in implementing and > keeping it running smoothly (as Julian mentioned, that is the main > reason why the Open Lineage community would like to make the > integration part of Airflow. But by being smart and integrating it in > the way that will allow to plug-it-in into our CI, verification > process and making some very clear expectations about what it means > for contributors to Airflow to get it running, we can make some > initial investment in making it happen and minimise on-going cost, > while maximising the gain. > > And looking at all the above - I am super happy to help with all that > to make this easy to "swallow" and integrate well, even if it will > take an extra effort, especially that we will have experts from Open > Lineage who worked with both Airflow and Open Lineage being the core > part of the effort. I am actually super excited - this might be the > next-big-thing for Airflow to strengthen its position as an > indispensable component of "even more modern data stack". > > I made my initial comments in the doc, and am looking forward to > making it happen :). > > J. > > On Fri, Jan 27, 2023 at 2:20 AM Julien Le Dem > <[email protected]> wrote: > > > > Dear Airflow Community, > > I have been working on a proposal to bring an OpenLineage provider to > Airflow. > > I am looking for feedback with the goal to post an official AIP. > > Please feel free to comment in the doc above. > > Thank you, > > Julien (OpenLineage project lead) > > > > For convenience, here is the rationale from the doc: > > > > Operational lineage collection is a common need to understand > dependencies between data pipelines and track end-to-end provenance of > data. It enables many use cases from ensuring reliable delivery of data > through observability to compliance and cost management. > > > > Publishing operational lineage is a core Airflow capability to enable > troubleshooting and governance. > > > > OpenLineage is a project part of the LFAI&Data foundation that provides > a spec standardizing operational lineage collection and sharing across the > data ecosystem. If it provides plugins for popular open source projects, > its intent is very similar to OpenTelemetry (also under the Linux > Foundation umbrella): to remain a spec for lineage exchange that projects - > open source or proprietary - implement. > > > > Built-in OpenLineage support in Airflow will make it easier and more > reliable for Airflow users to publish their operational lineage through the > OpenLineage ecosystem. > > > > The current external plugin maintained in the OpenLineage project > depends on Airflow and operators internals and gets broken when changes are > made on those. Having a built-in integration ensures a better first class > support to expose lineage that gets tested alongside other changes and > therefore is more stable. >
