+1 (binding) Overall I think this will make future development and growth for OL in Airflow much easier which will hopefully lead to more adoption!
________________________________ From: Vikram Koka <vik...@astronomer.io.INVALID> Sent: Monday, February 13, 2023 8:20:23 AM To: dev@airflow.apache.org Subject: RE: [EXTERNAL][VOTE] AIP-53 OpenLineage in Airflow CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. +1 binding. I have been looking at the doc and having lineage integrated with Airflow as a provider makes sense to me. On Mon, Feb 13, 2023 at 2:38 AM Kaxil Naik <kaxiln...@gmail.com<mailto:kaxiln...@gmail.com>> wrote: +1 binding , this should make lineage a first-class citizen for Airflow users. Excited for this one On Sun, 12 Feb 2023 at 07:57, Jarek Potiuk <ja...@potiuk.com<mailto:ja...@potiuk.com>> wrote: A little side-track., small comment to what Shubham wrote Yeah. I also noticed AIP-47 mentioned - but I considered that implementation detail. I read that those will be rather regular unit tests (so not reaching out to external systems as it makes little sense and we definitely want to make open-lineage tests run regularly with every PR - otherwise we would end up in the same boat as currently where the repos are separated out), I believe the AIP-47 mentioned there was more an attempt to say "the tests coverage will be high". Julian, am I right ? On Sat, Feb 11, 2023 at 11:57 PM Mehta, Shubham <shu...@amazon.com.invalid> wrote: > > +1 non-binding. I'll be on the lookout for initial PRs to learn more about > the implementation details of how System Tests will be extended to cover > these changes, as well as the ongoing maintenance required from providers. > The proposed changes should definitely make it easier for Airflow customers > to adopt lineage and improve stability. I'm looking forward to seeing how > customers will end up using it! > > > Shubham > > > > From: Julien Le Dem <jul...@astronomer.io.INVALID> > Reply-To: "dev@airflow.apache.org<mailto:dev@airflow.apache.org>" > <dev@airflow.apache.org<mailto:dev@airflow.apache.org>> > Date: Friday, February 10, 2023 at 3:28 PM > To: "dev@airflow.apache.org<mailto:dev@airflow.apache.org>" > <dev@airflow.apache.org<mailto:dev@airflow.apache.org>> > Subject: [EXTERNAL] [VOTE] AIP-53 OpenLineage in Airflow > > > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you can confirm the sender and know the > content is safe. > > > > Dear Airflow community, > > > > Following the discussion thread over the past few weeks, I'd like to call a > vote on AIP-53 OpenLineage in Airflow: > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow > > > > The discussion thread is linked in the confluence doc if you wish to consult > the history of the conversation. Thank you to all who contributed! > > > > This is my (non-binding!) +1, the vote will last until midnight (UTC) on > Friday 17th February. > > > > Thanks, > > Julien > > > > For reference, the Motivation section in the doc: > > Operational lineage collection is a common need to understand dependencies > between data pipelines and track end-to-end provenance of data. It enables > many use cases from ensuring reliable delivery of data through observability > to compliance and cost management. > > Publishing operational lineage is a core Airflow capability to enable > troubleshooting and governance. > > OpenLineage is a project part of the LFAI&Data foundation that provides a > spec standardizing operational lineage collection and sharing across the data > ecosystem. If it provides plugins for popular open source projects, its > intent is very similar to OpenTelemetry (also under the Linux Foundation > umbrella): to remain a spec for lineage exchange that projects - open source > or proprietary - implement. > > Built-in OpenLineage support in Airflow will make it easier and more reliable > for Airflow users to publish their operational lineage through the > OpenLineage ecosystem. > > The current external plugin maintained in the OpenLineage project depends on > Airflow and operators internals and gets broken when changes are made on > those. Having a built-in integration ensures a better first class support to > expose lineage that gets tested alongside other changes and therefore is more > stable. > > Today, OpenLineage consumers in the ecosystem include: Egeria (bank > compliance), Marquez (build your own metadata platform for compliance for > example), Microsoft Purview (Governance, …), Astro (data observability), > Amundsen. AWS recently blogged about using OpenLineage in the AWS ecosystem. > Other projects are at various levels of progress. > > On the producer side, there is support for open source projects like Airflow, > dbt, Spark, Flink, GreatExpectations and proprietary warehouses like > Snowflake, BigQuery, Redshift through API integration or SQL parsing. > > Examples of users talking about their usage of OpenLineage can be found on > the Openlineage blog.. > > This integration will also stimulate the continued growth of the OpenLineage > ecosystem and create more value for Airflow users.