jon-wei commented on issue #9380:
URL: https://github.com/apache/druid/issues/9380#issuecomment-621609693


   I've also been thinking about the Druid security model recently and reviewed 
the proposal and linked spreadsheet (the breakdown of existing endpoints is 
really helpful!), I have the following thoughts.
   
   I think the security model should achieve the following goals:
   - Support clear boundaries between common "personas" (described more below)
   - Have an easy-to-understand permission model 
   
   For the first goal, from my experience working with customers that use Druid 
security features, users think in terms of "personas" when setting up 
permissions, for example:
   - "Datasource reader": needs permissions to query some set of datasources 
and nothing else
   - "Data engineer": Someone whose role involves loading data and managing it, 
they need access to ingestion, compaction, and other related APIs
   - "Cluster Admin": Highest level of access
   
   For the second goal, I think we should keep the permissions model as simple 
as we can in a way that still supports the first goal.
   
   The ideal result I would hope to achieve with those goals is to make it easy 
for a user to answer a question like "What permissions are needed for a given 
persona?"
   
   Let me know if you think those goals make sense, or if there are other goals 
that you had in mind.
   
   -----
   
   From the perspective above, I like that this proposal introduces more 
resource types. 
   
   I have two high level comments on the resources:
   - We should think about dropping the STATE/CONFIG distinction
   - It could make sense to condense some of the resource types
   
   ---
   
   I think it's worth considering whether we want to keep STATE/CONFIG as high 
level resource categories going forward. 
   
   In my experience with the current model, there's sometimes significant 
overlap between STATE and CONFIG, for example, enabling/disabling workers is a 
CONFIG action, but checking whether a worker is enabled is a STATE action, and 
so it makes sense to assign those permissions together generally. 
   
   Another example could be checking the cluster properties, which I feel is 
arguably both STATE and CONFIG (i.e., the configuration is a kind of state). 
   
   The users I've worked with have also found that distinction somewhat hard to 
understand.
   
   ---
   In some cases, I think we could simplify the model by pruning some of the 
resource types.
   
   The example I had in mind was the coordinator load rules; 
   - If the load rules are datasource specific, this could be represented as a 
SET_LOAD_RULES action on the DATASOURCE resource type instead.
   - If the load rules are cluster-wide, this could fall under a base 
admin-esque resource, similar areas might include coordinator dynamic configs 
related to segment balancing.
   
   Structuring the resources that way could also make it easier to enumerate 
what all the possible actions on a datasource are.
   
   Other areas for potential simplification:
   - Compaction settings, similar to the load rule example
   - WORKERS and SERVERS (since they're all Druid cluster processes)
   
   There might be better approaches to these specific areas, I'm using the 
examples here more to show my thought process.
   
   ---
   
   In general, I'm thinking about the permissions with an approach where I'm 
focusing more on the underlying resources/associated personas+workflows and 
less on the specific API.
   
   Maybe it makes sense to start by defining the "core" resource types and then 
mapping the various endpoints to those core types.
   
   ==================================
   
   Additionally, do you have any thoughts on how the new security model should 
handle input sources for ingestion? 
   
   Currently, someone who has DATASOURCE WRITE access can submit ingestion 
tasks with InputSources or Firehoses that can read from any resources that the 
Druid server is able to access (e.g., S3 buckets). This means you can't use 
permissions to create different "data engineer" roles that each have different 
input sources they're allowed to read from. 
   
   I think it'd be good to enable this somehow, maybe via input source 
whitelisting.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to