Question - is there any reason you want to add it as a separate provider and not just another operator in the azure provider ? I looked at the code and it's not a lot, and I see no particular reason why it should not be simply yet another operator/Hook there. Just as many other operators - I was under the impression there is something special about the SDK you use or "proprietaredness" (for lack of a better word) - but that seems like yet-another operator, hook, triggerer in `microsoft.azure`.
Or am I missing something? J. On Fri, Mar 1, 2024 at 8:57 AM Blain David <david.bl...@infrabel.be> wrote: > Hello Jarek and Elad, > > Indeed maintenance could be a concern, but I think that's already the same > case regarding all other Microsoft related providers in Airflow. > Also know that I also already contributed to the Microsoft Graph Python > SDK project on the Microsoft repo: > https://github.com/microsoftgraph/msgraph-sdk-python > I also already contributed to the Airflow providers fixing bugs and doing > enhancements, so there is a commitment on my part also 😉 > The hook, operator and trigger are well tested, I always try to deliver > code that is tested as much as possible, we have a score of 90% on our > sonarqube and only a technical debt of 26 mins (which will probably > decrease). > > In the meantime I've been finetuning and testing the operator in our > environment, we now have multiple DAG's using it. > > Bellow some advantages of using the MSGraphSDKOperator: > - handles and refreshes bearer tokens automatically thanks to the azure > provider classes by Microsoft, which you don't need to maintain you just > use it. > - the operator is fully async using triggerers so requests and responses > are handled by separate workers, which is non-blocking resources for > nothing, which is good as the Microsoft implementation only allows async > calls anyway. > - you don't need to worry about paging, this is also handled automatically > asynchronously for you as Microsoft is following the OData spec, the > operator can handle this in a generic way ( > https://learn.microsoft.com/en-us/odata/client/pagination). > - uses the newer httpx library (https://github.com/encode/httpx), which > in our case as we are behind a corporate proxy, solves a lot of proxy > related connection issues which we encounter with the requests library but > don't when using httpx. > - only depends on the msgraph-core and the kiota_abstractions library from > Microsoft (https://github.com/microsoft/kiota-abstractions-python), which > are the foundation libs on which their full msgraph_sdk dependency is > build, which we don't need as this is only useful when you want to use > their generated Python client, which doesn't make sense in Airflow. > - as it's Microsoft, it's also compatible calling the PowerBI API, we > already had multiple DA'sG using this operator to call dataset refreshes on > the PowerBI REST API. > - probably the Intune API will also work with this operator, I'm going to > test this soon once we are migrating all our intune related custom jobs to > Airflow DAG's. > - another advantage is that all parameters can be defined in a Http > Airflow connection, independently if you want to interact with MS Graph or > PowerBI. > - as mentioned before, the hook, operator and trigger are well tested, we > have a score of 90% on our sonar and only a technical debt of 26 mins. > > The operator could be beneficial for everyone, as it avoids needing custom > code to achieve the same using the regular HttpOperator, which I think > could also need refactoring and maybe should be migrated using the httpx > library. > I'm willing to contribute there also, maybe we can come to a point that > the MSGraphSDKOperator could be based on the refactored HttpOperator, in > the meantime it would be nice if the operator would be already available in > Airflow > > What do you guys think? > > Kind regards, > David > > > -----Original Message----- > From: Jarek Potiuk <ja...@potiuk.com> > Sent: Friday, 26 January 2024 17:25 > To: dev@airflow.apache.org > Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow > > [You don't often get email from ja...@potiuk.com. Learn why this is > important at https://aka.ms/LearnAboutSenderIdentification ] > > EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze > niet vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, > stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto: > ab...@infrabel.be>. > > Hey David - any comments on that ? > > On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote: > > > Hi David, > > > > Thanks for raising this discussion. > > following the protocol established about accepting new providers - > > > > https://gith/ > > ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting-ne > > w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da54 > > 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0 > > %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ > > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrnCt > > dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0 > > My main concern here is how will provide ongoing mantaince for this > > provider? > > This provider is to handle a service by Microsoft yet Microsoft is not > > in the picture here (as far as I can see) > > > > > > On Sat, Jan 13, 2024 at 12:39 PM Blain David <david.bl...@infrabel.be> > > wrote: > > > > > Hello everyone, > > > > > > I've already started a discussion about this on the Airflow > discussions: > > > https://gi/ > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314 > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7CTW > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjglM > > > tPkGeWWQ%3D&reserved=0 > > > > > > As we have multiple DAG's interacting with MS Graph API endpoints, > > > and as we want to avoid custom code as much as possible as we have > > > to handle > > lot's > > > of data due to paging. > > > We thought of implementing an operator for it using the official > > > python client from Microsoft< > > https://gith/ > > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai > > n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb > > 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d8e > > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C30 > > 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D&r > > eserved=0>, > > > that way we can simplify our DAGs and remove custom code as much as > > > possible. > > > That's why we implemented an MSGraph SDK Provider which is on > > > GitHub< > > > https://gi/ > > > thub.com > %2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.blain% > 40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&reserved=0> > and also published it as an artifact on PyPi< > https://pypi.org/project/apache-airflow-providers-msgraph/>. > > > > > > As Jarek pointed out in the pull request for Third Party providers< > > > https://gi/ > > > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid.b > > > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e > > > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFpbG > > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn > > > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4ZD8 > > > Mkak%3D&reserved=0>, it's not a good idea > > to > > > use the apache-airflow prefix for the library as it is not > > > maintained by the Apache community (yet). > > > So there are 2 options, or we change the name or we donate the > > > project to the Apache Airflow community, which I think the later one > > > is the best option if there is interest to do it, hence why I also > > > started a > > discussion< > > > https://gi/ > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cdav > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314 > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7CTW > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZWs > > > cKsG6s%3D&reserved=0> about it a few > > weeks > > > ago on GitHub<https://github.com/apache/airflow/discussions/36315>. > > > > > > Kind regards, > > > David > > > > > > > > >