Then - just add it as another operator in Azure :). It's not more than yet-another-operator for Azure service. It would be a different story if it was a different "service" - with completely different stakeholders and maintenance burden involved. Being a "just another Azure operator" it adds very little overhead and generally we do not need to discuss it - just getting good quality PR with tests, docs etc. is enough.
On Mon, Mar 11, 2024 at 6:27 PM Blain David <david.bl...@infrabel.be> wrote: > > Hello Jarek, > > There is no particular need to add this into a separate provider, I just did > it as I wanted to deploy it myself, it could perfectly reside within the > Microsoft Azure provider one. > > There is nothing special about it, except it depends on the msgraph-core > dependency (which is light and doesn't have the generated client code the > msgraph-sdk has which we don't need in Airflow) which facilitates the REST > calls. > In my first POC it used the full msgraph-sdk dependency, which did not make > sense unless you would use the full blow client through the Hook in Airflow. > But as the main focus of this provider was to avoid as much as possible > custom python code in our DAG's, it made no sense. > The azure-identity dependency is also needed but that one is already present > in Airflow as the Microsoft Azure provider already uses it for the Fabric > hooks. > So indeed, it's just another hook, triggerer and operator, with the > particularity it only works in async (e.g. deferred) mode. > > What do you think? > > Kind regards, > David > > -----Original Message----- > From: Jarek Potiuk <ja...@potiuk.com> > Sent: Sunday, 10 March 2024 18:19 > To: dev@airflow.apache.org > Cc: Bienvenu Joffrey <joffrey.bienv...@infrabel.be> > Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow > > EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze niet > vertrouwt, klik niet op een link of open geen bijlages. Bij twijfel, stuur > deze e-mail als bijlage naar ab...@infrabel.be<mailto:ab...@infrabel.be>. > > Question - is there any reason you want to add it as a separate provider and > not just another operator in the azure provider ? I looked at the code and > it's not a lot, and I see no particular reason why it should not be simply > yet another operator/Hook there. Just as many other operators - I was under > the impression there is something special about the SDK you use or > "proprietaredness" (for lack of a better word) - but that seems like > yet-another operator, hook, triggerer in `microsoft.azure`. > > Or am I missing something? > > J. > > On Fri, Mar 1, 2024 at 8:57 AM Blain David <david.bl...@infrabel.be> wrote: > > > Hello Jarek and Elad, > > > > Indeed maintenance could be a concern, but I think that's already the > > same case regarding all other Microsoft related providers in Airflow. > > Also know that I also already contributed to the Microsoft Graph > > Python SDK project on the Microsoft repo: > > https://gith/ > > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.blai > > n%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb > > 18946f02e1f27f2%7C0%7C0%7C638456879909688297%7CUnknown%7CTWFpbGZsb3d8e > > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0% > > 7C%7C%7C&sdata=l7lEGyQ3IGHA7wOA4MFmDaQDLtF5PE3dSmxFWZEXgdc%3D&reserved > > =0 I also already contributed to the Airflow providers fixing bugs and > > doing enhancements, so there is a commitment on my part also 😉 > > The hook, operator and trigger are well tested, I always try to > > deliver code that is tested as much as possible, we have a score of > > 90% on our sonarqube and only a technical debt of 26 mins (which will > > probably decrease). > > > > In the meantime I've been finetuning and testing the operator in our > > environment, we now have multiple DAG's using it. > > > > Bellow some advantages of using the MSGraphSDKOperator: > > - handles and refreshes bearer tokens automatically thanks to the > > azure provider classes by Microsoft, which you don't need to maintain > > you just use it. > > - the operator is fully async using triggerers so requests and > > responses are handled by separate workers, which is non-blocking > > resources for nothing, which is good as the Microsoft implementation > > only allows async calls anyway. > > - you don't need to worry about paging, this is also handled > > automatically asynchronously for you as Microsoft is following the > > OData spec, the operator can handle this in a generic way ( > > https://learn.microsoft.com/en-us/odata/client/pagination). > > - uses the newer httpx library > > (https://git/ > > hub.com%2Fencode%2Fhttpx&data=05%7C02%7Cdavid.blain%40infrabel.be%7C04 > > 9ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909701865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Q24YPUOKEYPx%2FJ9ux1eyGraZ6YeWkGRuD0Z%2B40nFrak%3D&reserved=0), > > which in our case as we are behind a corporate proxy, solves a lot of > > proxy related connection issues which we encounter with the requests > > library but don't when using httpx. > > - only depends on the msgraph-core and the kiota_abstractions library > > from Microsoft > > (https://git/ > > hub.com%2Fmicrosoft%2Fkiota-abstractions-python&data=05%7C02%7Cdavid.b > > lain%40infrabel.be%7C049ea8fb06b54fbd934f08dc412649ed%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638456879909705929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qCdOyIeVSoN%2F%2BmTFuEyGmrM1TZERyA0Axgtbv6PtnOo%3D&reserved=0), > > which are the foundation libs on which their full msgraph_sdk dependency > > is build, which we don't need as this is only useful when you want to use > > their generated Python client, which doesn't make sense in Airflow. > > - as it's Microsoft, it's also compatible calling the PowerBI API, we > > already had multiple DA'sG using this operator to call dataset > > refreshes on the PowerBI REST API. > > - probably the Intune API will also work with this operator, I'm going > > to test this soon once we are migrating all our intune related custom > > jobs to Airflow DAG's. > > - another advantage is that all parameters can be defined in a Http > > Airflow connection, independently if you want to interact with MS > > Graph or PowerBI. > > - as mentioned before, the hook, operator and trigger are well tested, > > we have a score of 90% on our sonar and only a technical debt of 26 mins. > > > > The operator could be beneficial for everyone, as it avoids needing > > custom code to achieve the same using the regular HttpOperator, which > > I think could also need refactoring and maybe should be migrated using > > the httpx library. > > I'm willing to contribute there also, maybe we can come to a point > > that the MSGraphSDKOperator could be based on the refactored > > HttpOperator, in the meantime it would be nice if the operator would > > be already available in Airflow > > > > What do you guys think? > > > > Kind regards, > > David > > > > > > -----Original Message----- > > From: Jarek Potiuk <ja...@potiuk.com> > > Sent: Friday, 26 January 2024 17:25 > > To: dev@airflow.apache.org > > Subject: Re: [PROPOSAL] Adding MSGraphSDK Async Operator to Airflow > > > > [You don't often get email from ja...@potiuk.com. Learn why this is > > important at https://aka.ms/LearnAboutSenderIdentification ] > > > > EXTERNAL MAIL: Indien je de afzender van deze e-mail niet kent en deze > > niet vertrouwt, klik niet op een link of open geen bijlages. Bij > > twijfel, stuur deze e-mail als bijlage naar ab...@infrabel.be<mailto: > > ab...@infrabel.be>. > > > > Hey David - any comments on that ? > > > > On Mon, Jan 15, 2024 at 1:07 PM Elad Kalif <elad...@apache.org> wrote: > > > > > Hi David, > > > > > > Thanks for raising this discussion. > > > following the protocol established about accepting new providers - > > > > > > https://gith/ > > > ub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FPROVIDERS.rst%23accepting- > > > ne > > > w-community-providers&data=05%7C02%7Cdavid.blain%40infrabel.be%7C2da > > > 54 > > > 87c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18946f02e1f27f2%7C0%7 > > > C0 > > > %7C638418831528623158%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC > > > JQ > > > IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mrn > > > Ct > > > dP4HLfDYw1HeMfylfZkEv0p%2BMO4X3Sn6QJu8hU%3D&reserved=0 > > > My main concern here is how will provide ongoing mantaince for this > > > provider? > > > This provider is to handle a service by Microsoft yet Microsoft is > > > not in the picture here (as far as I can see) > > > > > > > > > On Sat, Jan 13, 2024 at 12:39 PM Blain David > > > <david.bl...@infrabel.be> > > > wrote: > > > > > > > Hello everyone, > > > > > > > > I've already started a discussion about this on the Airflow > > discussions: > > > > https://gi/ > > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd > > > > av > > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3 > > > > 14 > > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528632454%7CUnknown%7C > > > > TW > > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX > > > > VC > > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=WLeKg%2FSCcqkEGVGpK9OvuUaQIc02GAjjg > > > > lM > > > > tPkGeWWQ%3D&reserved=0 > > > > > > > > As we have multiple DAG's interacting with MS Graph API endpoints, > > > > and as we want to avoid custom code as much as possible as we have > > > > to handle > > > lot's > > > > of data due to paging. > > > > We thought of implementing an operator for it using the official > > > > python client from Microsoft< > > > https://gith/ > > > ub.com%2Fmicrosoftgraph%2Fmsgraph-sdk-python&data=05%7C02%7Cdavid.bl > > > ai > > > n%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6 > > > fb > > > 18946f02e1f27f2%7C0%7C0%7C638418831528638199%7CUnknown%7CTWFpbGZsb3d > > > 8e > > > yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C > > > 30 > > > 00%7C%7C%7C&sdata=qdH4r0OJxGjYXJ8jIQn41BS%2BIH64Nj8%2BKWSsYL2iobo%3D > > > &r > > > eserved=0>, > > > > that way we can simplify our DAGs and remove custom code as much > > > > as possible. > > > > That's why we implemented an MSGraph SDK Provider which is on > > > > GitHub< https://gi/ thub.com > > %2Finfrabel%2Fapache-airflow-providers-msgraph&data=05%7C02%7Cdavid.bl > > ain% > > 40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab8e4d6fb18 > > 946f02e1f27f2%7C0%7C0%7C638418831528642865%7CUnknown%7CTWFpbGZsb3d8eyJ > > WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000 > > %7C%7C%7C&sdata=Gs27ZiTInN1qI3%2FVojdkFFP%2BV4OLSeGTQ6sbFBItdsA%3D&res > > erved=0> and also published it as an artifact on PyPi< > > https://pypi.org/project/apache-airflow-providers-msgraph/>. > > > > > > > > As Jarek pointed out in the pull request for Third Party > > > > providers< https://gi/ > > > > thub.com%2Fapache%2Fairflow-site%2Fpull%2F933&data=05%7C02%7Cdavid > > > > .b > > > > lain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc314ab > > > > 8e > > > > 4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528651442%7CUnknown%7CTWFp > > > > bG > > > > Zsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6 > > > > Mn > > > > 0%3D%7C3000%7C%7C%7C&sdata=UbwEODVhHelR%2FAKDtW5pMiZ169qILlL4Vqe4Z > > > > D8 Mkak%3D&reserved=0>, it's not a good idea > > > to > > > > use the apache-airflow prefix for the library as it is not > > > > maintained by the Apache community (yet). > > > > So there are 2 options, or we change the name or we donate the > > > > project to the Apache Airflow community, which I think the later > > > > one is the best option if there is interest to do it, hence why I > > > > also started a > > > discussion< > > > > https://gi/ > > > > thub.com%2Fapache%2Fairflow%2Fdiscussions%2F36315&data=05%7C02%7Cd > > > > av > > > > id.blain%40infrabel.be%7C2da5487c207f4c39debd08dc1e8b75ac%7Cb82bc3 > > > > 14 > > > > ab8e4d6fb18946f02e1f27f2%7C0%7C0%7C638418831528655433%7CUnknown%7C > > > > TW > > > > FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJX > > > > VC > > > > I6Mn0%3D%7C3000%7C%7C%7C&sdata=IwiIleDSIkBC1IpYRYxOBtFHQAUBceLBvqZ > > > > Ws cKsG6s%3D&reserved=0> about it a few > > > weeks > > > > ago on GitHub<https://github.com/apache/airflow/discussions/36315>. > > > > > > > > Kind regards, > > > > David > > > > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org