+1 non-binding. I'll be on the lookout for initial PRs to learn more about the 
implementation details of how System Tests will be extended to cover these 
changes, as well as the ongoing maintenance required from providers. The 
proposed changes should definitely make it easier for Airflow customers to 
adopt lineage and improve stability. I'm looking forward to seeing how 
customers will end up using it!

Shubham

From: Julien Le Dem <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, February 10, 2023 at 3:28 PM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] [VOTE] AIP-53 OpenLineage in Airflow


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Dear Airflow community,

Following the discussion thread over the past few weeks, I'd like to call a 
vote on AIP-53 OpenLineage in Airflow:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow

The discussion thread is linked in the confluence doc if you wish to consult 
the history of the conversation. Thank you to all who contributed!

This is my (non-binding!) +1, the vote will last until midnight (UTC) on Friday 
17th February.

Thanks,
Julien

For reference, the Motivation section in the doc:

Operational lineage collection is a common need to understand dependencies 
between data pipelines and track end-to-end provenance of data. It enables many 
use cases from ensuring reliable delivery of data through observability to 
compliance and cost management.

Publishing operational lineage is a core Airflow capability to enable 
troubleshooting and governance.

OpenLineage<https://openlineage.io/> is a project part of the 
LFAI&Data<https://lfaidata.foundation/projects/> foundation that provides a 
spec standardizing operational lineage collection and sharing across the data 
ecosystem. If it provides plugins for popular open source projects, its intent 
is very similar to OpenTelemetry<https://opentelemetry.io/> (also under the 
Linux Foundation umbrella): to remain a spec for lineage exchange that projects 
- open source or proprietary - implement.

Built-in OpenLineage support in Airflow will make it easier and more reliable 
for Airflow users to publish their operational lineage through the OpenLineage 
ecosystem.

The current external plugin maintained in the OpenLineage project depends on 
Airflow and operators internals and gets broken when changes are made on those. 
Having a built-in integration ensures a better first class support to expose 
lineage that gets tested alongside other changes and therefore is more stable.

Today, OpenLineage consumers in the ecosystem include: 
Egeria<https://egeria-project.org/features/lineage-management/overview/#the-openlineage-standard>
 (bank compliance), Marquez<https://marquezproject.ai/> (build your own 
metadata platform for compliance for example), Microsoft 
Purview<https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/>
 (Governance, …), Astro<https://www.astronomer.io/why-openlineage/> (data 
observability), 
Amundsen<https://www.amundsen.io/amundsen/databuilder/#openlineagetablelineageextractor>.
 AWS recently blogged about using OpenLineage in the AWS 
ecosystem<https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/>.
 Other projects are at various levels of progress.

On the producer side, there is support for open source projects like Airflow, 
dbt, Spark, Flink, GreatExpectations and proprietary warehouses like 
Snowflake<https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/README.md>,
 BigQuery, 
Redshift<https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/>
 through API integration or SQL parsing.

Examples of users talking about their usage of OpenLineage can be found on the 
Openlineage 
blog<https://openlineage.io/blog/openlineage-at-northwestern-mutual/>..

This integration will also stimulate the continued growth of the OpenLineage 
ecosystem and create more value for Airflow users.

Reply via email to