Hello Jarek,

There is no particular need to add this into a separate provider, I just did it 
as I wanted to deploy it myself, it could perfectly reside within the Microsoft 
Azure provider one.

There is nothing special about it, except it depends on the msgraph-core 
dependency (which is light and doesn't have the generated client code the 
msgraph-sdk has which we don't need in Airflow) which facilitates the REST 
calls.
In my first POC it used the full msgraph-sdk dependency, which did not make 
sense unless you would use the full blow client through the Hook in Airflow.
But as the main focus of this provider was to avoid as much as possible custom 
python code in our DAG's, it made no sense.
The azure-identity dependency is also needed but that one is already present in 
Airflow as the Microsoft Azure provider already uses it for the Fabric hooks.
So indeed, it's just another hook, triggerer and operator, with the 
particularity it only works in async (e.g. deferred) mode.

What do you think?

Kind regards,
David

-----Original Message-----
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Sunday, 10 March 2024 18:19
To: dev@airflow.apache.org
Cc: Bienvenu Joffrey <joffrey.bienv...@infrabel.be>
Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow

EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur deze 
e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>.

Question - is there any reason you want to add it as a separate provider and 
not just another operator in the azure provider ? I looked at the code and it's 
not a lot, and I see no particular reason why it should not be simply yet 
another operator/Hook there. Just as many other operators - I was under the 
impression there is something special about the SDK you use or 
"proprietaredness" (for lack of a better word) - but that seems like 
yet-another operator, hook, triggerer in `microsoft.azure`.

Or am I missing something?

J.

On Fri, Mar 1, 2024 at 8:57 AM Blain David <david.bl...@infrabel.be> wrote:

> Hello Jarek and Elad,
>
> Indeed maintenance could be a concern, but I think that's already the
> same case regarding all other Microsoft related providers in Airflow.
> Also know that I also already contributed to the Microsoft Graph
> Python SDK project on the Microsoft repo:
> https://gith/
> ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai
> n%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb
> 18946f02e1f27f2%7C0%7C0%7C638456879909688297%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%
> 7C%7C%7C&sdata=l7lEGyQ3IGHA7wOA4MFmDaQDLtF5PE3dSmxFWZEXgdc%3D&reserved
> =0 I also already contributed to the Airflow providers fixing bugs and
> doing enhancements, so there is a commitment on my part also 😉
> The hook, operator and trigger are well tested, I always try to
> deliver code that is tested as much as possible, we have a score of
> 90% on our sonarqube and only a technical debt of 26 mins (which will
> probably decrease).
>
> In the meantime I've been finetuning and testing the operator in our
> environment, we now have multiple DAG's using it.
>
> Bellow some advantages of using the MSGraphSDKOperator:
> - handles and refreshes bearer tokens automatically thanks to the
> azure provider classes by Microsoft, which you don't need to maintain
> you just use it.
> - the operator is fully async using triggerers so requests and
> responses are handled by separate workers, which is non-blocking
> resources for nothing, which is good as the Microsoft implementation
> only allows async calls anyway.
> - you don't need to worry about paging, this is also handled
> automatically asynchronously for you as Microsoft is following the
> OData spec, the operator can handle this in a generic way (
> https://learn.microsoft.com/en-us/odata/client/pagination).
> - uses the newer httpx library
> (https://git/
> hub.com%2Fencode%2Fhttpx&data=05%7C02%7Cdavid.blain%40infrabel.be%7C04
> 9ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909701865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Q24YPUOKEYPx%2FJ9ux1eyGraZ6YeWkGRuD0Z%2B40nFrak%3D&reserved=0),
>  which in our case as we are behind a corporate proxy, solves a lot of proxy 
> related connection issues which we encounter with the requests library but 
> don't when using httpx.
> - only depends on the msgraph-core and the kiota_abstractions library
> from Microsoft
> (https://git/
> hub.com%2Fmicrosoft%2Fkiota-abstractions-python&data=05%7C02%7Cdavid.b
> lain%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909705929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qCdOyIeVSoN%2F%2BmTFuEyGmrM1TZERyA0Axgtbv6PtnOo%3D&reserved=0),
>  which are the foundation libs on which their full msgraph_sdk dependency is 
> build, which we don't need as this is only useful when you want to use their 
> generated Python client, which doesn't make sense in Airflow.
> - as it's Microsoft, it's also compatible calling the PowerBI API, we
> already had multiple DA'sG using this operator to call dataset
> refreshes on the PowerBI REST API.
> - probably the Intune API will also work with this operator, I'm going
> to test this soon once we are migrating all our intune related custom
> jobs to Airflow DAG's.
> - another advantage is that all parameters can be defined in a Http
> Airflow connection, independently if you want to interact with MS
> Graph or PowerBI.
> - as mentioned before, the hook, operator and trigger are well tested,
> we have a score of 90% on our sonar and only a technical debt of 26 mins.
>
> The operator could be beneficial for everyone, as it avoids needing
> custom code to achieve the same using the regular HttpOperator, which
> I think could also need refactoring and maybe should be migrated using
> the httpx library.
> I'm willing to contribute there also, maybe we can come to a point
> that the MSGraphSDKOperator could be based on the refactored
> HttpOperator, in the meantime it would be nice if the operator would
> be already available in Airflow
>
> What do you guys think?
>
> Kind regards,
> David
>
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Friday, 26 January 2024 17:25
> To: dev@airflow.apache.org
> Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow
>
> [You don't often get email from ja...@potiuk.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> niet vertrouwt, klik niet op een link of open geen bijlages. Bij
> twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:
> ab...@infrabel.be>.
>
> Hey David - any comments on that ?
>
> On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote:
>
> > Hi David,
> >
> > Thanks for raising this discussion.
> > following the protocol established about accepting new providers -
> >
> > https://gith/
> > ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting-
> > ne
> > w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da
> > 54
> > 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7
> > C0
> > %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC
> > JQ
> > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrn
> > Ct
> > dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0
> > My main concern here is how will provide ongoing mantaince for this
> > provider?
> > This provider is to handle a service by Microsoft yet Microsoft is
> > not in the picture here (as far as I can see)
> >
> >
> > On Sat, Jan 13, 2024 at 12:39 PM Blain David
> > <david.bl...@infrabel.be>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I've already started a discussion about this on the Airflow
> discussions:
> > > https://gi/
> > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd
> > > av
> > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3
> > > 14
> > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7C
> > > TW
> > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > VC
> > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjg
> > > lM
> > > tPkGeWWQ%3D&reserved=0
> > >
> > > As we have multiple DAG's interacting with MS Graph API endpoints,
> > > and as we want to avoid custom code as much as possible as we have
> > > to handle
> > lot's
> > > of data due to paging.
> > > We thought of implementing an operator for it using the official
> > > python client from Microsoft<
> > https://gith/
> > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.bl
> > ai
> > n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6
> > fb
> > 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d
> > 8e
> > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > 30
> > 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D
> > &r
> > eserved=0>,
> > > that way we can simplify our DAGs and remove custom code as much
> > > as possible.
> > > That's why we implemented an MSGraph SDK Provider which is on
> > > GitHub< https://gi/ thub.com
> %2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.bl
> ain%
> 40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18
> 946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
> %7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&res
> erved=0> and also published it as an artifact on PyPi<
> https://pypi.org/project/apache-airflow-providers-msgraph/>.
> > >
> > > As Jarek pointed out in the pull request for Third Party
> > > providers< https://gi/
> > > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid
> > > .b
> > > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab
> > > 8e
> > > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFp
> > > bG
> > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> > > Mn
> > > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4Z
> > > D8 Mkak%3D&reserved=0>, it's not a good idea
> > to
> > > use the apache-airflow prefix for the library as it is not
> > > maintained by the Apache community (yet).
> > > So there are 2 options, or we change the name or we donate the
> > > project to the Apache Airflow community, which I think the later
> > > one is the best option if there is interest to do it, hence why I
> > > also started a
> > discussion<
> > > https://gi/
> > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd
> > > av
> > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3
> > > 14
> > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7C
> > > TW
> > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > VC
> > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZ
> > > Ws cKsG6s%3D&reserved=0> about it a few
> > weeks
> > > ago on GitHub<https://github.com/apache/airflow/discussions/36315>.
> > >
> > > Kind regards,
> > > David
> > >
> > >
> >
>

Reply via email to