Thanks, Kaxil – that helped to clarify the proposal a bit more.

> Replacing Access Control provided by FAB with a base/core security model 
> (that is still resource-based)

Are you suggesting that we build this resource-driven security model directly 
into Airflow, without relying on external dependencies like FAB?

> Extend this to the other Airflow components (scheduler, workers, triggered, 
> cli)

Are there cases where the scheduler or CLI would require the authorization API? 
Since they are considered trusted components, I assumed they would not need it.


Jarek - as always, I appreciate you sharing your thoughts and having an open 
discussion.

> Which really explains what "Airflow as a Platform" is all about. I do not 
> think we already know all the parts that should be converted into "Airflow 
> extendability". It's more of an incremental effort like that where we have 
> those bright ideas "Hey - this part can be removed and delegated to others".  
> I think this has never been formulated explicitly but I think for quite a 
> while we are really in the mode where we think much more about what we can 
> SPLIT OUT from Airflow rather than what we can ADD to Airflow.

Understood. I like the idea of extensibility and "Airflow as a platform." 
However, we should make sure that we do not worsen the user experience with the 
extensibility. The "User Management Provider" is something that could 
potentially make the user experience worse, especially for customers who are 
self-hosting Airflow. Managed services will ensure that they dedicate resources 
to maintaining their user management providers. Multi-tenancy will end up 
becoming a feature for managed service customers, leaving the 74% of Airflow 
users [1] with a less powerful Airflow. As an example, Timetables is a very 
powerful feature, which, anecdotally, no customer ends up using due to its 
complexity.

I am still unclear about other user scenarios related to user management, 
besides multi-tenancy, that Airflow customers are looking to enable. While the 
extensibility we aim for will enable this, is there a need for it? Also, 
@Google-folks, @Astronomer-folks, @Azure-folks, et al. - are you interested in 
building a custom user management provider that works with your platform? Have 
there been cases where your customers were limited by the current permissioning 
model, and you considered replacing FAB? 

I believe that the primary motivation for "user management provider" is driven 
by the excitement around getting rid of FAB, which I think we can still achieve 
while including multi-tenancy in the core Airflow. Both should be treated as 
separate problems.

References:
1. 
https://airflow.apache.org/blog/airflow-survey-2022/#how-do-you-deploy-airflow-multiple-choice

On 2023-02-14, 12:44 PM, "Jarek Potiuk" <[email protected]> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Comment to Subham's question:

    > In addition, are there any other user scenarios, beyond multi-tenancy, 
that Airflow users are looking to enable and that require this pluggability? 
Asking as I haven't come across them. Overall, I believe we need more 
information on your proposal before seeking feedback from the community. Could 
we work together during February to develop a concrete proposal?

    I am glad you asked. I think, this is one of the  what I wanted to
    achieve by adding this page
    
https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst
    - it will be live in 2.6 and one of the main parts is this one:

    
https://github.com/apache/airflow/blob/main/docs/apache-airflow/public-airflow-interface.rst#using-public-interface-to-extend-airflow-capabilities

    Which really explains what "Airflow as a Platform" is all about. I do
    not think we already know all the parts that should be converted into
    "Airflow extendability". It's more of an incremental effort like that
    where we have those bright ideas "Hey - this part can be removed and
    delegated to others".  I think this has never been formulated
    explicitly but I think for quite a while we are really in the mode
    where we think much more about what we can SPLIT OUT from Airflow
    rather than what we can ADD to Airflow.

    When you look at it, this is also the main idea behind Open Lineage
    integration for example - we are adding open linage (which is really
    just an API) so that others can build "everything-lineage" on top of
    it. So we are adding a minimum-possible set of APIs and integration so
    that we can expose the lineage capability so that all the lineage "UI"
    and other use cases that lineage exposes would be done outside. We are
    in a strong position to do it - being sure that when we expose it,
    others will implement the integration they care about.

    I think more and more (and It has been preached by Ash mostly, but
    also others) that we should be focusing solely on being an extremely
    powerful and robust scheduler and make sure we are exposing all of the
    possible things that can be exposed as an external API (while still
    providing basic implementation that makes airflow still a "finished"
    product that can be used to handle basic cases.

    BTW. We are now preparing for the Airflow Summit CFP (some
    announcements will follow shortly, I do not want to spill too many
    beans) and we have a very interesting broad category "Airflow and
    ...." . And I think we should work in the direction that the `...` is
    far bigger than Airflow itself.

    J.

    On Tue, Feb 14, 2023 at 12:34 PM Kaxil Naik <[email protected]> wrote:
    >
    > Great idea Vikram, I love the idea of making this a provider/pluggable.
    >
    > In some ways, we already have a pluggable mechanism for Authentication 
with Auth Backends [1]. Where we will need lot more work I think is:
    >
    > Replacing Access Control provided by FAB with a base/core security model 
(that is still resource-based) [2]
    > Extend this to the other Airflow components (scheduler, workers, 
triggered, cli) or make them all driven by a single API that takes care of 
Auth. This will also reduce a lot of duplication of code across many of the 
components
    > For backwards compact, we could ship with FAB-provider that still uses 
Flask-app builder in addition to our recommended provider that will have more 
features and users/companies/stabkeholders can build on top of that provider to 
extend it further.
    >
    >
    > References:
    > [1]: 
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#auth-backends
    > [2]: 
https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/access-control.html
    >
    > On Tue, 14 Feb 2023 at 02:06, Mehta, Shubham <[email protected]> 
wrote:
    >>
    >> Hi Vikram,
    >> Thank you for taking the time to review the proposal. I appreciate your 
insights — I will make sure to reach out to you directly in the future for 
feedback as that would've undoubtedly saved us some time and effort.
    >>
    >> In regards to the separation of user management, I understand your 
concerns and, on a high-level, I agree with you. However, I think it would be 
beneficial to have more details on how it will work. Here are a few questions 
that come to mind:
    >> 1. How will the user-id/group-id interface interact with Airflow 
resource-level permissions? What parts of "John can-edit dag1 and can-view 
dag2" be part of Airflow core? What will be exposed to the external system?
    >> 2. Who will be responsible for managing the resource-level permissions? 
Will it be the external system?
    >> 3. What are the limitations of this new pluggable model compared to FAB? 
Will there be restrictions on the granularity of resource access that Airflow 
admins can provide to their users?
    >> 4. As Jarek pointed out, with this change we want to make authorization 
externally driven. Will this have a significant impact on Airflow performance 
as authorization will be required for fetching variables, executing tasks, etc.?
    >> 5. What will the migration process look like for existing users to this 
non-FAB pluggable model?
    >>
    >> In addition, are there any other user scenarios, beyond multi-tenancy, 
that Airflow users are looking to enable and that require this pluggability? 
Asking as I haven't come across them. Overall, I believe we need more 
information on your proposal before seeking feedback from the community. Could 
we work together during February to develop a concrete proposal?
    >>
    >> Beside this, I would like to propose that we define the scope and 
long-term vision of "Airflow core". To achieve this, it may be helpful to first 
outline the perspectives of the Airflow PMCs. Recently, there have been 
discussions regarding the separation of executors into a separate package, the 
implementation of pluggable schedulers, and other related topics. Currently, 
these decisions and discussions are somewhat ad hoc and are made through the 
mailing list. I would be happy to collaborate and invest time in this effort.
    >>
    >> Regards
    >> Shubham
    >>
    >> On 2023-02-13, 11:04 AM, "Jarek Potiuk" <[email protected]> wrote:
    >>
    >>     CAUTION: This email originated from outside of the organization. Do 
not click links or open attachments unless you can confirm the sender and know 
the content is safe.
    >>
    >>
    >>
    >>     Hey Vikram,
    >>
    >>     I think it's brilliant and I wonder how it happened that had not
    >>     occurred to us earlier. And I believe that is due to the natural
    >>     tendency of "following as we always did" rather than thinking
    >>     completely out-of-the-box. Thanks Vikram for bringing it up.
    >>
    >>     The funny thing is that when I see this:
    >>
    >>     > However, I don't agree that this level of user management belongs 
in "Core Airflow".
    >>
    >>     I almost immediately think - NOOOOO, why, it's always been here, how
    >>     can we remove it?
    >>
    >>     But then if you look a bit closer:
    >>
    >>     > think this is a time to consider the concept of a "user management 
provider" with a simple built-in implementation being the current Airflow 
functionality, enabling alternate more complex (but separate) implementations 
such as your proposal here as alternate user management providers.
    >>
    >>     Then it starts to make way more sense. Way more.
    >>
    >>     And when you look further:
    >>
    >>     >  Maybe, this also enables us to get rid of the Fab security 
manager from core Airflow?
    >>
    >>     My heart jumps and I am immediately sold on the idea.
    >>
    >>     When I was commenting on the doc  initially, something was not right.
    >>     I had a feeling It is probably the 5th time I am looking and
    >>     commenting on a similar document. And, well, I did, actually. Most of
    >>     the things we discussed there are already implemented out there. We
    >>     just need to make sure we expose enough of the API to use them. For
    >>     example we have Keycloak that is an open source implementation of
    >>     Identity and Access Management. With everything out there already
    >>     integrated. and I've been part of the project that integrated just 
the
    >>     authentication part. Now if we rethink the authorization and make it
    >>     simpler and "externally driven", this will not only be faster IMHO,
    >>     but also will allow enterprise users to integrate much better.
    >>
    >>     I believe following the path that Vikram outlined will be a good
    >>     direction for everyone in the community - including all the Manage
    >>     Service providers, who will have a far easier job on integrating
    >>     Airflow into their authentication models.
    >>
    >>     J.
    >>
    >>
    >>
    >>     On Mon, Feb 13, 2023 at 6:24 PM Vikram Koka
    >>     <[email protected]> wrote:
    >>     >
    >>     > Shubham and Vincent,
    >>     >
    >>     > Let me start by saying that I apologize for my delayed response to 
your original email.
    >>     >
    >>     > I appreciate the detailed write-up and the thought behind it. I 
completely agree with your use case and understand how this is applicable to 
enterprises with multiple data teams using Airflow.
    >>     >
    >>     > However, I don't agree that this level of user management belongs 
in "Core Airflow".
    >>     >
    >>     > I strongly believe that the core Airflow mission is for the 
community at large and for data practitioners either individuals or teams 
within enterprises. And therefore, I don't disagree with the intent of making 
it easier for enterprise teams to adopt Airflow. But, I think there is a never 
ending list of user management features which are needed to support Enterprise 
needs. We have already struggled with this over time and faced challenges with 
the Fab security manager and its integration in Airflow.
    >>     >
    >>     > I think we should use this opportunity and your use case to 
"separate the user management" from Core Airflow outside of the absolute 
basics. I think this is a time to consider the concept of a "user management 
provider" with a simple built-in implementation being the current Airflow 
functionality, enabling alternate more complex (but separate) implementations 
such as your proposal here as alternate user management providers. Maybe, this 
also enables us to get rid of the Fab security manager from core Airflow?
    >>     >
    >>     > Best regards,
    >>     > Vikram
    >>     >
    >>     >
    >>     > On Fri, Feb 3, 2023 at 8:22 AM Beck, Vincent 
<[email protected]> wrote:
    >>     >>
    >>     >> Thanks __
    >>     >>
    >>     >> On 2023-02-03, 10:55 AM, "Jarek Potiuk" <[email protected]> wrote:
    >>     >>
    >>     >>     CAUTION: This email originated from outside of the 
organization. Do not click links or open attachments unless you can confirm the 
sender and know the content is safe.
    >>     >>
    >>     >>
    >>     >>
    >>     >>     Added.
    >>     >>
    >>     >>     On Fri, Feb 3, 2023 at 3:53 PM Beck, Vincent
    >>     >>     <[email protected]> wrote:
    >>     >>     >
    >>     >>     > Thank you! 
https://cwiki.apache.org/confluence/display/~vin100.beck
    >>     >>     >
    >>     >>     > On 2023-02-02, 5:38 PM, "Jarek Potiuk" <[email protected]> 
wrote:
    >>     >>     >
    >>     >>     >     CAUTION: This email originated from outside of the 
organization. Do not click links or open attachments unless you can confirm the 
sender and know the content is safe.
    >>     >>     >
    >>     >>     >
    >>     >>     >
    >>     >>     >     What's your cwiki ID, Vincent (I'll add you without 
going into details yet)
    >>     >>     >
    >>     >>
    >>

Reply via email to