Hello Jarek and Elad,

Indeed maintenance could be a concern, but I think that's already the same case 
regarding all other Microsoft related providers in Airflow.
Also know that I also already contributed to the Microsoft Graph Python SDK 
project on the Microsoft repo: 
https://github.com/microsoftgraph/msgraph-sdk-python
I also already contributed to the Airflow providers fixing bugs and doing 
enhancements, so there is a commitment on my part also 😉
The hook, operator and trigger are well tested, I always try to deliver code 
that is tested as much as possible, we have a score of 90% on our sonarqube and 
only a technical debt of 26 mins (which will probably decrease).

In the meantime I've been finetuning and testing the operator in our 
environment, we now have multiple DAG's using it.

Bellow some advantages of using the MSGraphSDKOperator:
- handles and refreshes bearer tokens automatically thanks to the azure 
provider classes by Microsoft, which you don't need to maintain you just use it.
- the operator is fully async using triggerers so requests and responses are 
handled by separate workers, which is non-blocking resources for nothing, which 
is good as the Microsoft implementation only allows async calls anyway.
- you don't need to worry about paging, this is also handled automatically 
asynchronously for you as Microsoft is following the OData spec, the operator 
can handle this in a generic way 
(https://learn.microsoft.com/en-us/odata/client/pagination).
- uses the newer httpx library (https://github.com/encode/httpx), which in our 
case as we are behind a corporate proxy, solves a lot of proxy related 
connection issues which we encounter with the requests library but don't when 
using httpx.
- only depends on the msgraph-core and the kiota_abstractions library from 
Microsoft (https://github.com/microsoft/kiota-abstractions-python), which are 
the foundation libs on which their full msgraph_sdk dependency is build, which 
we don't need as this is only useful when you want to use their generated 
Python client, which doesn't make sense in Airflow.
- as it's Microsoft, it's also compatible calling the PowerBI API, we already 
had multiple DA'sG using this operator to call dataset refreshes on the PowerBI 
REST API.
- probably the Intune API will also work with this operator, I'm going to test 
this soon once we are migrating all our intune related custom jobs to Airflow 
DAG's.
- another advantage is that all parameters can be defined in a Http Airflow 
connection, independently if you want to interact with MS Graph or PowerBI.
- as mentioned before, the hook, operator and trigger are well tested, we have 
a score of 90% on our sonar and only a technical debt of 26 mins.

The operator could be beneficial for everyone, as it avoids needing custom code 
to achieve the same using the regular HttpOperator, which I think could also 
need refactoring and maybe should be migrated using the httpx library.
I'm willing to contribute there also, maybe we can come to a point that the 
MSGraphSDKOperator could be based on the refactored HttpOperator, in the 
meantime it would be nice if the operator would be already available in Airflow

What do you guys think?

Kind regards,
David


-----Original Message-----
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Friday, 26 January 2024 17:25
To: dev@airflow.apache.org
Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow

[You don't often get email from ja...@potiuk.com. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze 
e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>.

Hey David - any comments on that ?

On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote:

> Hi David,
>
> Thanks for raising this discussion.
> following the protocol established about accepting new providers -
>
> https://gith/
> ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting-ne
> w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da54
> 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0
> %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrnCt
> dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0
> My main concern here is how will provide ongoing mantaince for this
> provider?
> This provider is to handle a service by Microsoft yet Microsoft is not
> in the picture here (as far as I can see)
>
>
> On Sat, Jan 13, 2024 at 12:39 PM Blain David <david.bl...@infrabel.be>
> wrote:
>
> > Hello everyone,
> >
> > I've already started a discussion about this on the Airflow discussions:
> > https://gi/
> > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav
> > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314
> > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7CTW
> > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
> > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjglM
> > tPkGeWWQ%3D&reserved=0
> >
> > As we have multiple DAG's interacting with MS Graph API endpoints,
> > and as we want to avoid custom code as much as possible as we have
> > to handle
> lot's
> > of data due to paging.
> > We thought of implementing an operator for it using the official
> > python client from Microsoft<
> https://gith/
> ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai
> n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb
> 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30
> 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D&r
> eserved=0>,
> > that way we can simplify our DAGs and remove custom code as much as
> > possible.
> > That's why we implemented an MSGraph SDK Provider which is on
> > GitHub<
> > https://gi/
> > thub.com%2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&reserved=0>
> >  and also published it as an artifact on PyPi< 
> > https://pypi.org/project/apache-airflow-providers-msgraph/>.
> >
> > As Jarek pointed out in the pull request for Third Party providers<
> > https://gi/
> > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid.b
> > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e
> > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFpbG
> > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4ZD8
> > Mkak%3D&reserved=0>, it's not a good idea
> to
> > use the apache-airflow prefix for the library as it is not
> > maintained by the Apache community (yet).
> > So there are 2 options, or we change the name or we donate the
> > project to the Apache Airflow community, which I think the later one
> > is the best option if there is interest to do it, hence why I also
> > started a
> discussion<
> > https://gi/
> > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav
> > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314
> > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7CTW
> > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
> > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZWs
> > cKsG6s%3D&reserved=0> about it a few
> weeks
> > ago on GitHub<https://github.com/apache/airflow/discussions/36315>.
> >
> > Kind regards,
> > David
> >
> >
>

Reply via email to