Cool
On Mon, Feb 13, 2023 at 6:24 PM Julien Le Dem <[email protected]> wrote: > > [changing the subject line to separate this discussion from the voting thread] > Thank you Jarek, > Yes, I am expecting most of the testing coverage to be in unit tests. > I think following up on tickets and PRs is appropriate to make sure coverage > is at the right level and tests are in the right place. > I'm looking forward to more discussion on the details. > Julien > > On Sat, Feb 11, 2023 at 11:57 PM Jarek Potiuk <[email protected]> wrote: >> >> A little side-track., small comment to what Shubham wrote >> >> Yeah. I also noticed AIP-47 mentioned - but I considered that >> implementation detail. I read that those will be rather regular unit >> tests (so not reaching out to external systems as it makes little >> sense and we definitely want to make open-lineage tests run regularly >> with every PR - otherwise we would end up in the same boat as >> currently where the repos are separated out), I believe the AIP-47 >> mentioned there was more an attempt to say "the tests coverage will be >> high". Julian, am I right ? >> >> On Sat, Feb 11, 2023 at 11:57 PM Mehta, Shubham >> <[email protected]> wrote: >> > >> > +1 non-binding. I'll be on the lookout for initial PRs to learn more about >> > the implementation details of how System Tests will be extended to cover >> > these changes, as well as the ongoing maintenance required from providers. >> > The proposed changes should definitely make it easier for Airflow >> > customers to adopt lineage and improve stability. I'm looking forward to >> > seeing how customers will end up using it! >> > >> > >> > Shubham >> > >> > >> > >> > From: Julien Le Dem <[email protected]> >> > Reply-To: "[email protected]" <[email protected]> >> > Date: Friday, February 10, 2023 at 3:28 PM >> > To: "[email protected]" <[email protected]> >> > Subject: [EXTERNAL] [VOTE] AIP-53 OpenLineage in Airflow >> > >> > >> > >> > CAUTION: This email originated from outside of the organization. Do not >> > click links or open attachments unless you can confirm the sender and know >> > the content is safe. >> > >> > >> > >> > Dear Airflow community, >> > >> > >> > >> > Following the discussion thread over the past few weeks, I'd like to call >> > a vote on AIP-53 OpenLineage in Airflow: >> > >> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow >> > >> > >> > >> > The discussion thread is linked in the confluence doc if you wish to >> > consult the history of the conversation. Thank you to all who contributed! >> > >> > >> > >> > This is my (non-binding!) +1, the vote will last until midnight (UTC) on >> > Friday 17th February. >> > >> > >> > >> > Thanks, >> > >> > Julien >> > >> > >> > >> > For reference, the Motivation section in the doc: >> > >> > Operational lineage collection is a common need to understand dependencies >> > between data pipelines and track end-to-end provenance of data. It enables >> > many use cases from ensuring reliable delivery of data through >> > observability to compliance and cost management. >> > >> > Publishing operational lineage is a core Airflow capability to enable >> > troubleshooting and governance. >> > >> > OpenLineage is a project part of the LFAI&Data foundation that provides a >> > spec standardizing operational lineage collection and sharing across the >> > data ecosystem. If it provides plugins for popular open source projects, >> > its intent is very similar to OpenTelemetry (also under the Linux >> > Foundation umbrella): to remain a spec for lineage exchange that projects >> > - open source or proprietary - implement. >> > >> > Built-in OpenLineage support in Airflow will make it easier and more >> > reliable for Airflow users to publish their operational lineage through >> > the OpenLineage ecosystem. >> > >> > The current external plugin maintained in the OpenLineage project depends >> > on Airflow and operators internals and gets broken when changes are made >> > on those. Having a built-in integration ensures a better first class >> > support to expose lineage that gets tested alongside other changes and >> > therefore is more stable. >> > >> > Today, OpenLineage consumers in the ecosystem include: Egeria (bank >> > compliance), Marquez (build your own metadata platform for compliance for >> > example), Microsoft Purview (Governance, …), Astro (data observability), >> > Amundsen. AWS recently blogged about using OpenLineage in the AWS >> > ecosystem. Other projects are at various levels of progress. >> > >> > On the producer side, there is support for open source projects like >> > Airflow, dbt, Spark, Flink, GreatExpectations and proprietary warehouses >> > like Snowflake, BigQuery, Redshift through API integration or SQL parsing. >> > >> > Examples of users talking about their usage of OpenLineage can be found on >> > the Openlineage blog.. >> > >> > This integration will also stimulate the continued growth of the >> > OpenLineage ecosystem and create more value for Airflow users.
