Cool

On Mon, Feb 13, 2023 at 6:24 PM Julien Le Dem
<[email protected]> wrote:
>
> [changing the subject line to separate this discussion from the voting thread]
> Thank you Jarek,
> Yes, I am expecting most of the testing coverage to be in unit tests.
> I think following up on tickets and PRs is appropriate to make sure coverage 
> is at the right level and tests are in the right place.
> I'm looking forward to more discussion on the details.
> Julien
>
> On Sat, Feb 11, 2023 at 11:57 PM Jarek Potiuk <[email protected]> wrote:
>>
>> A little side-track., small comment to what Shubham wrote
>>
>> Yeah. I also noticed AIP-47 mentioned - but I considered that
>> implementation detail. I read that those will be rather regular unit
>> tests (so not reaching out to external systems as it makes little
>> sense and we definitely want to make open-lineage tests run regularly
>> with every PR - otherwise we would end up in the same boat as
>> currently where the repos are separated out), I believe the AIP-47
>> mentioned there was more an attempt to say "the tests coverage will be
>> high". Julian, am I right ?
>>
>> On Sat, Feb 11, 2023 at 11:57 PM Mehta, Shubham
>> <[email protected]> wrote:
>> >
>> > +1 non-binding. I'll be on the lookout for initial PRs to learn more about 
>> > the implementation details of how System Tests will be extended to cover 
>> > these changes, as well as the ongoing maintenance required from providers. 
>> > The proposed changes should definitely make it easier for Airflow 
>> > customers to adopt lineage and improve stability. I'm looking forward to 
>> > seeing how customers will end up using it!
>> >
>> >
>> > Shubham
>> >
>> >
>> >
>> > From: Julien Le Dem <[email protected]>
>> > Reply-To: "[email protected]" <[email protected]>
>> > Date: Friday, February 10, 2023 at 3:28 PM
>> > To: "[email protected]" <[email protected]>
>> > Subject: [EXTERNAL] [VOTE] AIP-53 OpenLineage in Airflow
>> >
>> >
>> >
>> > CAUTION: This email originated from outside of the organization. Do not 
>> > click links or open attachments unless you can confirm the sender and know 
>> > the content is safe.
>> >
>> >
>> >
>> > Dear Airflow community,
>> >
>> >
>> >
>> > Following the discussion thread over the past few weeks, I'd like to call 
>> > a vote on AIP-53 OpenLineage in Airflow:
>> >
>> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow
>> >
>> >
>> >
>> > The discussion thread is linked in the confluence doc if you wish to 
>> > consult the history of the conversation. Thank you to all who contributed!
>> >
>> >
>> >
>> > This is my (non-binding!) +1, the vote will last until midnight (UTC) on 
>> > Friday 17th February.
>> >
>> >
>> >
>> > Thanks,
>> >
>> > Julien
>> >
>> >
>> >
>> > For reference, the Motivation section in the doc:
>> >
>> > Operational lineage collection is a common need to understand dependencies 
>> > between data pipelines and track end-to-end provenance of data. It enables 
>> > many use cases from ensuring reliable delivery of data through 
>> > observability to compliance and cost management.
>> >
>> > Publishing operational lineage is a core Airflow capability to enable 
>> > troubleshooting and governance.
>> >
>> > OpenLineage is a project part of the LFAI&Data foundation that provides a 
>> > spec standardizing operational lineage collection and sharing across the 
>> > data ecosystem. If it provides plugins for popular open source projects, 
>> > its intent is very similar to OpenTelemetry (also under the Linux 
>> > Foundation umbrella): to remain a spec for lineage exchange that projects 
>> > - open source or proprietary - implement.
>> >
>> > Built-in OpenLineage support in Airflow will make it easier and more 
>> > reliable for Airflow users to publish their operational lineage through 
>> > the OpenLineage ecosystem.
>> >
>> > The current external plugin maintained in the OpenLineage project depends 
>> > on Airflow and operators internals and gets broken when changes are made 
>> > on those. Having a built-in integration ensures a better first class 
>> > support to expose lineage that gets tested alongside other changes and 
>> > therefore is more stable.
>> >
>> > Today, OpenLineage consumers in the ecosystem include: Egeria (bank 
>> > compliance), Marquez (build your own metadata platform for compliance for 
>> > example), Microsoft Purview (Governance, …), Astro (data observability), 
>> > Amundsen. AWS recently blogged about using OpenLineage in the AWS 
>> > ecosystem. Other projects are at various levels of progress.
>> >
>> > On the producer side, there is support for open source projects like 
>> > Airflow, dbt, Spark, Flink, GreatExpectations and proprietary warehouses 
>> > like Snowflake, BigQuery, Redshift through API integration or SQL parsing.
>> >
>> > Examples of users talking about their usage of OpenLineage can be found on 
>> > the Openlineage blog..
>> >
>> > This integration will also stimulate the continued growth of the 
>> > OpenLineage ecosystem and create more value for Airflow users.

Reply via email to