> 1. https://registry.astronomer.io/
> 2. Using the new classifier 
> https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider

Yep. precisely what I thought to place at the top of the ecosystem page.

> On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" 
> <ferru...@amazon.com.INVALID> wrote:
>>
>> I still think that easy inclusion with a defined pruning process is best, 
>> but it's looking like that is the minority opinion.  In which case, IFF we 
>> are going to be keeping them separate then I definitely agree that there 
>> needs to be a fast/easy/convenient way to find them.
>> ________________________________
>> From: Jarek Potiuk <ja...@potiuk.com>
>> Sent: Monday, April 25, 2022 7:17 AM
>> To: dev@airflow.apache.org
>> Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
>>
>> CAUTION: This email originated from outside of the organization. Do not 
>> click links or open attachments unless you can confirm the sender and know 
>> the content is safe.
>>
>>
>>
>> Just to come back to it (please everyone a little patience - I think
>> some people have not chimed in yet due to 2.3.0 "focus" so this
>> discussion might take a little more time.
>>
>> My current thinking on it so far:
>>
>> * I am not really in the camp of "lets not add any more providers at
>> all" and also not in the "let's accept all that are good quality code
>> providers". I think there are a few providers which "after fulfilling
>> all the criteria" could be added - mostly open-source standards,
>> generic, established technologies - but it should be rather limited
>> and rare event.
>>
>> * when there is a proprietary service which has not too broad reach
>> and it's not likely that we will have some committers who will be
>> maintaining it - becauyse they are users - the default option should
>> be to make a standalone per-service providers. the difficulty here is
>> to set the right "non-quality" criteria - but I think we really want
>> to limit any new code to maintain. Here maybe we can have some more
>> concrete criteria proposed - so that we do not have to vote
>> individually on each proposed providers - and so that those who want
>> to propose a provider could check themselves by reading the criteria,
>> what's best for them.
>>
>> * we might improve our "providers" list at the "ecosystem" to make
>> providers stand out a bit more (maybe simply put them on top and make
>> a clearly visible section). We are not going to maintain and keep the
>> nice "registry" similar to Astronomer's one (we could even actually
>> make the link to the Astronomer registry more prominent as the way to
>> "search" for providers on our Ecosystem Page. We could also add a link
>> to Pypi with the "aifrflow provider" classifier at the ecosystem page
>> as another way of searching for providers. All that is perfectly fine,
>> I think with the ASF Policies and spirit. And it will be good for
>> discovery.
>>
>> WDYT?
>>
>> J.
>>
>> On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote:
>>>
>>>
>>> Hello!
>>>
>>> The reason behind submitting Flyte provider to the Airflow repository is 
>>> because we felt it'd be effortless for the Airflow users to use the 
>>> integration. Moreover, since it'd be under the umbrella of Airflow, we 
>>> estimated that the Airflow users would not hesitate from using the provider.
>>>
>>> We could definitely have this as a standalone provider, but the 
>>> easy-to-get-started incentive of Airflow providers seemed like a better 
>>> option.
>>>
>>> If there's a sophisticated plan in place for having standalone providers in 
>>> PyPI, we're up for it.
>>>
>>> Thanks,
>>> Samhita
>>>
>>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote:
>>>>
>>>>
>>>> Hello all
>>>>
>>>> I want to try to explain a motivation behind submission of the Delta 
>>>> Sharing provider:
>>>>
>>>> Let me start with the fact that the original issue was created against 
>>>> Airflow repository, and it was accepted as potential new functionality. 
>>>> And discussion about new providers has started almost on the day when PR 
>>>> was submitted :-)
>>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation 
>>>> that defines a protocol and reference implementations. It was started by 
>>>> the Databricks, but has other contributors as well - that's why it wasn't 
>>>> pushed into a Databricks provider, as it's not specific to Databricks.
>>>> Another thought about submitting it as a separate provider was to get more 
>>>> people interested in this functionality and build additional integrations 
>>>> on top of it.
>>>> Another important aspect of having providers in the Airflow repository is 
>>>> that they are tested together with changes in the core of the Airflow.
>>>>
>>>> I completely understand the concerns about more maintenance effort, but my 
>>>> personal point of view (about it below) is similar to Rafal's & John's - 
>>>> if there are well defined criteria & plans for decommissioning or 
>>>> something like, then providers could be part of the releases, etc.
>>>>
>>>> I just want to add that although I'm employed by Databricks, I'm not a 
>>>> part of the development team - I'm in the field team who work with 
>>>> customers, sees how they are using different tools, seeing pain points, 
>>>> etc.  Most of work so far was done on my own time - I'm doing some 
>>>> coordination, but most of new functionality (AAD tokens support, Repos, 
>>>> Databricks SQL operators, etc.) is coming from seeing customers using 
>>>> Airflow together with Databricks.
>>>>
>>>>
>>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz 
>>>> <rafalbieg...@google.com.invalid> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I think that we will need to find some middle ground here - we are trying 
>>>>> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would 
>>>>> also add another 4th dimension - Airflow Service Provider, :).
>>>>>
>>>>> Airflow users - whether they do self-managed Airflow or use "managed 
>>>>> Airflow" provided by others are beneficients of the fact that Airflow has 
>>>>> a decent portfolio of providers.
>>>>> It's not only a guarantee that these providers should work fine and they 
>>>>> meet Airflow coding/testing standards. It's also a kind of guarantee, 
>>>>> that once they start using Airflow
>>>>> with providers backed by the Airflow community they won't be on their own 
>>>>> when it comes to troubleshooting/updating/etc. It will be much easier for 
>>>>> them to convince their companies to use Airflow for production use cases 
>>>>> as the Airflow platform (core + providers) is tested/maintained by the 
>>>>> Airflow community.
>>>>>
>>>>> Keeping providers within the Airflow repository generates integration and 
>>>>> maintenance work on the Airflow community side. On the other hand, if 
>>>>> this work is not done within the community then this effort would need to 
>>>>> be done by all users to a certain extent. So from this perspective it's 
>>>>> more optimal for the community to do it so users can use off-the-shelf 
>>>>> Airflow for the majority of their use cases
>>>>>
>>>>> When it comes to accepting new providers - I like John's suggestions:
>>>>> a) well defined standard that needs to be met by providers - passing the 
>>>>> "provider qualification" would be some effort so each service provider 
>>>>> would need to decide if it wouldn't be easier to maintain their provider 
>>>>> on their own.
>>>>> b) well define lifecycle for providers - which would allow to identify 
>>>>> providers that are obsolete/not popular any more and make them obsolete.
>>>>>
>>>>> Regards, Rafal.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>
>>>>>> I've been thinking about it - to make up my mind a little. The good 
>>>>>> thing for me is that I have no strong opinion and I can rather easily 
>>>>>> see (or so I think) of both sides.
>>>>>>
>>>>>> TL;DR; I think we need an explanation from the "Service Providers" - 
>>>>>> what they want to achieve by contributing providers to the community and 
>>>>>> see if we can achieve similar results differently.
>>>>>>
>>>>>>
>>>>>> Obviously I am a bit biased from the maintainer point of view, but since 
>>>>>> I cooperate with various stakeholders i spoke to some of them just see 
>>>>>> their point of view and this is what I got:
>>>>>>
>>>>>> Seems that we have really three  types of stakeholders that are really 
>>>>>> interested in "providers":
>>>>>>
>>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take 
>>>>>> care about its future and development and "grand vision" of where we 
>>>>>> want to be in few years
>>>>>> 2) "Users" - those who use Airflow and integration with the Service 
>>>>>> Provider
>>>>>> 3) "Service providers" - those who run the services that Airflow 
>>>>>> integrates with - via providers (that group might also contain those 
>>>>>> stakeholders that run Airflow "as a service")
>>>>>>
>>>>>> Let me see it from all the different POVs:
>>>>>>
>>>>>>
>>>>>> From 1) Maintainer POV
>>>>>>
>>>>>> More providers mean slower growth of the platform overall as the more 
>>>>>> providers we add and manage as a community, the less time we can spend 
>>>>>> on improving Airflow as a core.
>>>>>> Also the vision I think we all share is that Airflow is not a 
>>>>>> "standalone orchestrator" any more - due to its popularity, reach and 
>>>>>> power, it became an "orchestrating platform" and this is the vision that 
>>>>>> keeps us - maintainers - busy.
>>>>>>
>>>>>> Over the last 2 years pretty much everything we do - make Airflow "more 
>>>>>> extensible". You can add custom "secrets managers". "timetables", 
>>>>>> "defferers" etc. "Customizability" is now built-in and "theme" of being 
>>>>>> a modern platform.
>>>>>> Hell - we even recently added "Airflow Provider" trove classified in 
>>>>>> PyPI: 
>>>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>>>>>>  and the main justification in the discussion was that we expect MORE 
>>>>>> 3rd-parties to use it, rather than relying on "apache-airflow-provider" 
>>>>>> package name.
>>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to 
>>>>>> Airlow makes perfect sense and is the way to go.
>>>>>>
>>>>>>
>>>>>> From  2) User POV
>>>>>>
>>>>>> Users want to use Airflow with all the integrations they use together. 
>>>>>> But only with those that they actually use. Similarly as maintainers - 
>>>>>> supporting and needing all 70+ providers is something they usually do 
>>>>>> not REALLY care about.
>>>>>> They literally care about the few providers they use. We even taught the 
>>>>>> users that they can upgrade and install providers separately from the 
>>>>>> core. So they already know they can mix and match Airflow + Providers to 
>>>>>> get what they want.
>>>>>>
>>>>>> And they do use it - even if they use our image, the image only contains 
>>>>>> a handful of the providers and when they need to install
>>>>>> new providers - they just install it from PyPI. And for that the 
>>>>>> difference of "community providers" vs. 3rd party providers - except the 
>>>>>> stamp of approval of the ASF, is not really visible.
>>>>>> Surely they can use [extras] to install the providers but that is just a 
>>>>>> convenience and is definitely not needed by the users.
>>>>>> For example when they build a custom image they usually extend Airflow 
>>>>>> and simply 'pip install <PROVIDER>'
>>>>>> As long as someone makes sure that the provider can be installed on 
>>>>>> certain versions of Airflow - it does not matter.
>>>>>>
>>>>>> Also from the users perspective Airflow became "popular" enough that it 
>>>>>> no longer needed "more integrations" to be more "appealing" for the 
>>>>>> users.
>>>>>> They already use Airflow. They like it (hopefully) and the fact that 
>>>>>> this or that provider is part of the community makes no difference any 
>>>>>> more.
>>>>>>
>>>>>>
>>>>>> From 3) "Service providers" POV
>>>>>>
>>>>>> Here I am not sure. It's not very clear what service providers get from 
>>>>>> being part of the "community providers".
>>>>>>
>>>>>> I hear that some big service (cloud providers) find it cool that we give 
>>>>>> it the ASF "Stamp of Approval". And they are willing to pay the price of 
>>>>>> a slower merge process, dependence on the community and following strict 
>>>>>> rules of the ASF.
>>>>>> And the community also is happy to pay the price of maintaining those 
>>>>>> (including the dependencies which Elad mention) to make sure that all 
>>>>>> the community providers work in concert - because those "Services" are 
>>>>>> hugely popular and we "want" as a community to invest there.
>>>>>> But maintaining those  deps in sync is a huge effort and it will become 
>>>>>> even worse - the more we add. On the other hand for 3rd party providers 
>>>>>> it will be EASIER to keep up.
>>>>>> They don't have to care about all the community providers to work 
>>>>>> together, they can choose a subset. And when they release their 
>>>>>> libraries they can take care about making sure the dependencies are not 
>>>>>> broken.
>>>>>>
>>>>>> There are other "drawbacks" for being a "community" provider. For 
>>>>>> example we have the rule that we support the min-Airflow version for 
>>>>>> providers from the community 12 months after Airflow release.
>>>>>> This means that users of Airflow 2.1 will not receive updates for the 
>>>>>> providers after 21st of May. This is the price to pay for 
>>>>>> community-managed providers. We will not release bug fixes in providers 
>>>>>> or changes for Airflow 2.1 users after 21st of May.
>>>>>> But if you manage your own provider - you still can support 2.0 or even 
>>>>>> 1.10 if you want.
>>>>>>
>>>>>> I cannot really see why a Service Provider would want to become an 
>>>>>> Airflow Community Provider.
>>>>>>
>>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, 
>>>>>> and Cloudera people think and why they think this is the best choice.
>>>>>>
>>>>>> I think when we understand what the  "Service Providers" want to achieve 
>>>>>> this way, maybe we will be able to come up with some middle ground and 
>>>>>> at least set some rules when it makes sense and when it does not make 
>>>>>> sense.
>>>>>> Maybe 'contributing provider' is the way to achieve something else and 
>>>>>> we simply do not realize that in the new "Airflow as a Platform" world, 
>>>>>> all the stakeholders can achieve very similar results using different 
>>>>>> approaches.
>>>>>>
>>>>>> * For example we could think about how we can make it easier for Airflow 
>>>>>> users to discover and install their providers - without actually taking 
>>>>>> ownership of the code by the community.
>>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a 
>>>>>> "compliance check" as suggested above
>>>>>> * Or maybe we could introduce a "breeze" extension to be able to install 
>>>>>> and test provider in the "latest airflow" so that the service providers 
>>>>>> could check it before we even release airflow and dependencies
>>>>>>
>>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I 
>>>>>> think) - could you tell us (every one of you separately) - what are your 
>>>>>> goals when you came up with the "contribute the new provider" idea?
>>>>>>
>>>>>> J.
>>>>>>
>>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Ash what is your recommendation for the users should we follow your 
>>>>>>> suggestion?
>>>>>>> This means that the big big big joy of using airflow constraints and 
>>>>>>> getting a working environment with all required providers will be no 
>>>>>>> more.
>>>>>>> So users will get a working "Vanilla" Airflow and then will need to 
>>>>>>> figure out how they are going to tackle independent providers that may 
>>>>>>> not be able to coexist one with another.
>>>>>>> This means that users will need to create their own constraints 
>>>>>>> mechanism and maintain it.
>>>>>>>
>>>>>>> From my perspective this increases the complexity of getting Airflow to 
>>>>>>> be production ready.
>>>>>>> I know that we say providers vs core but I think that from users 
>>>>>>> perspective providers are an integral part of Airflow.
>>>>>>> Having the best scheduler and the best UI is not enough. Providers are 
>>>>>>> a crucial part that complete the set.
>>>>>>>
>>>>>>> Maybe eventually there should be something like a provider store where 
>>>>>>> there can be official providers and 3rd party providers.
>>>>>>>
>>>>>>> This may be even greater discussion than what we are having here. It 
>>>>>>> feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty 
>>>>>>> <col...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think 
>>>>>>>> in an ideal world even the providers currently part of the Airflow 
>>>>>>>> repo would be managed separately. (I'm not actually suggesting 
>>>>>>>> removing any providers.) I don't think it's a matter of gatekeeping, I 
>>>>>>>> just think it's actually kind of odd to have providers in the same 
>>>>>>>> repo as core Airflow, and it increases confusion about Airflow 
>>>>>>>> versions vs provider package versions.
>>>>>>>>
>>>>>>>> Collin McNulty
>>>>>>>>
>>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <turbas...@apache.org> 
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the 
>>>>>>>>> packages may streamline many aspects for providers/companies.
>>>>>>>>>
>>>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources 
>>>>>>>>> needed for running them.
>>>>>>>>> 3. The development of the package can be incorporated into their 
>>>>>>>>> company processes - not every company is used to OSS mode.
>>>>>>>>>
>>>>>>>>> Whatever way we go - we should have some basics guidelines and 
>>>>>>>>> requirements (for example to brand a provider as “recommended by 
>>>>>>>>> community” or something).
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Tomsk
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> With best wishes,                    Alex Ott
>>>> http://alexott.net/
>>>> Twitter: alexott_en (English), alexott (Russian)

Reply via email to