Then - just add it as another operator in Azure :).

It's not more than yet-another-operator for Azure service. It would be
a different story if it was a different "service" - with completely
different stakeholders and maintenance burden involved. Being a "just
another Azure operator" it adds very little overhead and generally we
do not need to discuss it - just getting good quality PR with tests,
docs etc. is enough.

On Mon, Mar 11, 2024 at 6:27 PM Blain David <david.bl...@infrabel.be> wrote:
>
> Hello Jarek,
>
> There is no particular need to add this into a separate provider, I just did 
> it as I wanted to deploy it myself, it could perfectly reside within the 
> Microsoft Azure provider one.
>
> There is nothing special about it, except it depends on the msgraph-core 
> dependency (which is light and doesn't have the generated client code the 
> msgraph-sdk has which we don't need in Airflow) which facilitates the REST 
> calls.
> In my first POC it used the full msgraph-sdk dependency, which did not make 
> sense unless you would use the full blow client through the Hook in Airflow.
> But as the main focus of this provider was to avoid as much as possible 
> custom python code in our DAG's, it made no sense.
> The azure-identity dependency is also needed but that one is already present 
> in Airflow as the Microsoft Azure provider already uses it for the Fabric 
> hooks.
> So indeed, it's just another hook, triggerer and operator, with the 
> particularity it only works in async (e.g. deferred) mode.
>
> What do you think?
>
> Kind regards,
> David
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Sunday, 10 March 2024 18:19
> To: dev@airflow.apache.org
> Cc: Bienvenu Joffrey <joffrey.bienv...@infrabel.be>
> Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow
>
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet 
> vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur 
> deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>.
>
> Question - is there any reason you want to add it as a separate provider and 
> not just another operator in the azure provider ? I looked at the code and 
> it's not a lot, and I see no particular reason why it should not be simply 
> yet another operator/Hook there. Just as many other operators - I was under 
> the impression there is something special about the SDK you use or 
> "proprietaredness" (for lack of a better word) - but that seems like 
> yet-another operator, hook, triggerer in `microsoft.azure`.
>
> Or am I missing something?
>
> J.
>
> On Fri, Mar 1, 2024 at 8:57 AM Blain David <david.bl...@infrabel.be> wrote:
>
> > Hello Jarek and Elad,
> >
> > Indeed maintenance could be a concern, but I think that's already the
> > same case regarding all other Microsoft related providers in Airflow.
> > Also know that I also already contributed to the Microsoft Graph
> > Python SDK project on the Microsoft repo:
> > https://gith/
> > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai
> > n%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb
> > 18946f02e1f27f2%7C0%7C0%7C638456879909688297%7CUnknown%7CTWFpbGZsb3d8e
> > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%
> > 7C%7C%7C&sdata=l7lEGyQ3IGHA7wOA4MFmDaQDLtF5PE3dSmxFWZEXgdc%3D&reserved
> > =0 I also already contributed to the Airflow providers fixing bugs and
> > doing enhancements, so there is a commitment on my part also 😉
> > The hook, operator and trigger are well tested, I always try to
> > deliver code that is tested as much as possible, we have a score of
> > 90% on our sonarqube and only a technical debt of 26 mins (which will
> > probably decrease).
> >
> > In the meantime I've been finetuning and testing the operator in our
> > environment, we now have multiple DAG's using it.
> >
> > Bellow some advantages of using the MSGraphSDKOperator:
> > - handles and refreshes bearer tokens automatically thanks to the
> > azure provider classes by Microsoft, which you don't need to maintain
> > you just use it.
> > - the operator is fully async using triggerers so requests and
> > responses are handled by separate workers, which is non-blocking
> > resources for nothing, which is good as the Microsoft implementation
> > only allows async calls anyway.
> > - you don't need to worry about paging, this is also handled
> > automatically asynchronously for you as Microsoft is following the
> > OData spec, the operator can handle this in a generic way (
> > https://learn.microsoft.com/en-us/odata/client/pagination).
> > - uses the newer httpx library
> > (https://git/
> > hub.com%2Fencode%2Fhttpx&data=05%7C02%7Cdavid.blain%40infrabel.be%7C04
> > 9ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909701865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Q24YPUOKEYPx%2FJ9ux1eyGraZ6YeWkGRuD0Z%2B40nFrak%3D&reserved=0),
> >  which in our case as we are behind a corporate proxy, solves a lot of 
> > proxy related connection issues which we encounter with the requests 
> > library but don't when using httpx.
> > - only depends on the msgraph-core and the kiota_abstractions library
> > from Microsoft
> > (https://git/
> > hub.com%2Fmicrosoft%2Fkiota-abstractions-python&data=05%7C02%7Cdavid.b
> > lain%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909705929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qCdOyIeVSoN%2F%2BmTFuEyGmrM1TZERyA0Axgtbv6PtnOo%3D&reserved=0),
> >  which are the foundation libs on which their full msgraph_sdk dependency 
> > is build, which we don't need as this is only useful when you want to use 
> > their generated Python client, which doesn't make sense in Airflow.
> > - as it's Microsoft, it's also compatible calling the PowerBI API, we
> > already had multiple DA'sG using this operator to call dataset
> > refreshes on the PowerBI REST API.
> > - probably the Intune API will also work with this operator, I'm going
> > to test this soon once we are migrating all our intune related custom
> > jobs to Airflow DAG's.
> > - another advantage is that all parameters can be defined in a Http
> > Airflow connection, independently if you want to interact with MS
> > Graph or PowerBI.
> > - as mentioned before, the hook, operator and trigger are well tested,
> > we have a score of 90% on our sonar and only a technical debt of 26 mins.
> >
> > The operator could be beneficial for everyone, as it avoids needing
> > custom code to achieve the same using the regular HttpOperator, which
> > I think could also need refactoring and maybe should be migrated using
> > the httpx library.
> > I'm willing to contribute there also, maybe we can come to a point
> > that the MSGraphSDKOperator could be based on the refactored
> > HttpOperator, in the meantime it would be nice if the operator would
> > be already available in Airflow
> >
> > What do you guys think?
> >
> > Kind regards,
> > David
> >
> >
> > -----Original Message-----
> > From: Jarek Potiuk <ja...@potiuk.com>
> > Sent: Friday, 26 January 2024 17:25
> > To: dev@airflow.apache.org
> > Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow
> >
> > [You don't often get email from ja...@potiuk.com. Learn why this is
> > important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> > niet vertrouwt, klik niet op een link of open geen bijlages. Bij
> > twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:
> > ab...@infrabel.be>.
> >
> > Hey David - any comments on that ?
> >
> > On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote:
> >
> > > Hi David,
> > >
> > > Thanks for raising this discussion.
> > > following the protocol established about accepting new providers -
> > >
> > > https://gith/
> > > ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting-
> > > ne
> > > w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da
> > > 54
> > > 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7
> > > C0
> > > %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC
> > > JQ
> > > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrn
> > > Ct
> > > dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0
> > > My main concern here is how will provide ongoing mantaince for this
> > > provider?
> > > This provider is to handle a service by Microsoft yet Microsoft is
> > > not in the picture here (as far as I can see)
> > >
> > >
> > > On Sat, Jan 13, 2024 at 12:39 PM Blain David
> > > <david.bl...@infrabel.be>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > I've already started a discussion about this on the Airflow
> > discussions:
> > > > https://gi/
> > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd
> > > > av
> > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3
> > > > 14
> > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7C
> > > > TW
> > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > > VC
> > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjg
> > > > lM
> > > > tPkGeWWQ%3D&reserved=0
> > > >
> > > > As we have multiple DAG's interacting with MS Graph API endpoints,
> > > > and as we want to avoid custom code as much as possible as we have
> > > > to handle
> > > lot's
> > > > of data due to paging.
> > > > We thought of implementing an operator for it using the official
> > > > python client from Microsoft<
> > > https://gith/
> > > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.bl
> > > ai
> > > n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6
> > > fb
> > > 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d
> > > 8e
> > > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > > 30
> > > 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D
> > > &r
> > > eserved=0>,
> > > > that way we can simplify our DAGs and remove custom code as much
> > > > as possible.
> > > > That's why we implemented an MSGraph SDK Provider which is on
> > > > GitHub< https://gi/ thub.com
> > %2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.bl
> > ain%
> > 40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18
> > 946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJ
> > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
> > %7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&res
> > erved=0> and also published it as an artifact on PyPi<
> > https://pypi.org/project/apache-airflow-providers-msgraph/>.
> > > >
> > > > As Jarek pointed out in the pull request for Third Party
> > > > providers< https://gi/
> > > > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid
> > > > .b
> > > > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab
> > > > 8e
> > > > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFp
> > > > bG
> > > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> > > > Mn
> > > > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4Z
> > > > D8 Mkak%3D&reserved=0>, it's not a good idea
> > > to
> > > > use the apache-airflow prefix for the library as it is not
> > > > maintained by the Apache community (yet).
> > > > So there are 2 options, or we change the name or we donate the
> > > > project to the Apache Airflow community, which I think the later
> > > > one is the best option if there is interest to do it, hence why I
> > > > also started a
> > > discussion<
> > > > https://gi/
> > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd
> > > > av
> > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3
> > > > 14
> > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7C
> > > > TW
> > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX
> > > > VC
> > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZ
> > > > Ws cKsG6s%3D&reserved=0> about it a few
> > > weeks
> > > > ago on GitHub<https://github.com/apache/airflow/discussions/36315>.
> > > >
> > > > Kind regards,
> > > > David
> > > >
> > > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to