Question - is there any reason you want to add it as a separate provider
and not just another operator in the azure provider ? I looked at the code
and it's not a lot, and I see no particular reason why it should not be
simply yet another operator/Hook there. Just as many other operators - I
was under the impression there is something special about the SDK you use
or "proprietaredness" (for lack of a better word) - but that seems like
yet-another operator, hook, triggerer in `microsoft.azure`.

Or am I missing something?

J.

On Fri, Mar 1, 2024 at 8:57 AM Blain David <david.bl...@infrabel.be> wrote:

> Hello Jarek and Elad,
>
> Indeed maintenance could be a concern, but I think that's already the same
> case regarding all other Microsoft related providers in Airflow.
> Also know that I also already contributed to the Microsoft Graph Python
> SDK project on the Microsoft repo:
> https://github.com/microsoftgraph/msgraph-sdk-python
> I also already contributed to the Airflow providers fixing bugs and doing
> enhancements, so there is a commitment on my part also 😉
> The hook, operator and trigger are well tested, I always try to deliver
> code that is tested as much as possible, we have a score of 90% on our
> sonarqube and only a technical debt of 26 mins (which will probably
> decrease).
>
> In the meantime I've been finetuning and testing the operator in our
> environment, we now have multiple DAG's using it.
>
> Bellow some advantages of using the MSGraphSDKOperator:
> - handles and refreshes bearer tokens automatically thanks to the azure
> provider classes by Microsoft, which you don't need to maintain you just
> use it.
> - the operator is fully async using triggerers so requests and responses
> are handled by separate workers, which is non-blocking resources for
> nothing, which is good as the Microsoft implementation only allows async
> calls anyway.
> - you don't need to worry about paging, this is also handled automatically
> asynchronously for you as Microsoft is following the OData spec, the
> operator can handle this in a generic way (
> https://learn.microsoft.com/en-us/odata/client/pagination).
> - uses the newer httpx library (https://github.com/encode/httpx), which
> in our case as we are behind a corporate proxy, solves a lot of proxy
> related connection issues which we encounter with the requests library but
> don't when using httpx.
> - only depends on the msgraph-core and the kiota_abstractions library from
> Microsoft (https://github.com/microsoft/kiota-abstractions-python), which
> are the foundation libs on which their full msgraph_sdk dependency is
> build, which we don't need as this is only useful when you want to use
> their generated Python client, which doesn't make sense in Airflow.
> - as it's Microsoft, it's also compatible calling the PowerBI API, we
> already had multiple DA'sG using this operator to call dataset refreshes on
> the PowerBI REST API.
> - probably the Intune API will also work with this operator, I'm going to
> test this soon once we are migrating all our intune related custom jobs to
> Airflow DAG's.
> - another advantage is that all parameters can be defined in a Http
> Airflow connection, independently if you want to interact with MS Graph or
> PowerBI.
> - as mentioned before, the hook, operator and trigger are well tested, we
> have a score of 90% on our sonar and only a technical debt of 26 mins.
>
> The operator could be beneficial for everyone, as it avoids needing custom
> code to achieve the same using the regular HttpOperator, which I think
> could also need refactoring and maybe should be migrated using the httpx
> library.
> I'm willing to contribute there also, maybe we can come to a point that
> the MSGraphSDKOperator could be based on the refactored HttpOperator, in
> the meantime it would be nice if the operator would be already available in
> Airflow
>
> What do you guys think?
>
> Kind regards,
> David
>
>
> -----Original Message-----
> From: Jarek Potiuk <ja...@potiuk.com>
> Sent: Friday, 26 January 2024 17:25
> To: dev@airflow.apache.org
> Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow
>
> [You don't often get email from ja...@potiuk.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze
> niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel,
> stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto:
> ab...@infrabel.be>.
>
> Hey David - any comments on that ?
>
> On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote:
>
> > Hi David,
> >
> > Thanks for raising this discussion.
> > following the protocol established about accepting new providers -
> >
> > https://gith/
> > ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting-ne
> > w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da54
> > 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0
> > %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrnCt
> > dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0
> > My main concern here is how will provide ongoing mantaince for this
> > provider?
> > This provider is to handle a service by Microsoft yet Microsoft is not
> > in the picture here (as far as I can see)
> >
> >
> > On Sat, Jan 13, 2024 at 12:39 PM Blain David <david.bl...@infrabel.be>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > I've already started a discussion about this on the Airflow
> discussions:
> > > https://gi/
> > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav
> > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314
> > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7CTW
> > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
> > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjglM
> > > tPkGeWWQ%3D&reserved=0
> > >
> > > As we have multiple DAG's interacting with MS Graph API endpoints,
> > > and as we want to avoid custom code as much as possible as we have
> > > to handle
> > lot's
> > > of data due to paging.
> > > We thought of implementing an operator for it using the official
> > > python client from Microsoft<
> > https://gith/
> > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai
> > n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb
> > 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d8e
> > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30
> > 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D&r
> > eserved=0>,
> > > that way we can simplify our DAGs and remove custom code as much as
> > > possible.
> > > That's why we implemented an MSGraph SDK Provider which is on
> > > GitHub<
> > > https://gi/
> > > thub.com
> %2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.blain%
> 40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&reserved=0>
> and also published it as an artifact on PyPi<
> https://pypi.org/project/apache-airflow-providers-msgraph/>.
> > >
> > > As Jarek pointed out in the pull request for Third Party providers<
> > > https://gi/
> > > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid.b
> > > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e
> > > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFpbG
> > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> > > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4ZD8
> > > Mkak%3D&reserved=0>, it's not a good idea
> > to
> > > use the apache-airflow prefix for the library as it is not
> > > maintained by the Apache community (yet).
> > > So there are 2 options, or we change the name or we donate the
> > > project to the Apache Airflow community, which I think the later one
> > > is the best option if there is interest to do it, hence why I also
> > > started a
> > discussion<
> > > https://gi/
> > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav
> > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314
> > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7CTW
> > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC
> > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZWs
> > > cKsG6s%3D&reserved=0> about it a few
> > weeks
> > > ago on GitHub<https://github.com/apache/airflow/discussions/36315>.
> > >
> > > Kind regards,
> > > David
> > >
> > >
> >
>

Reply via email to