jon-wei commented on issue #9380: URL: https://github.com/apache/druid/issues/9380#issuecomment-621609693
I've also been thinking about the Druid security model recently and reviewed the proposal and linked spreadsheet (the breakdown of existing endpoints is really helpful!), I have the following thoughts. I think the security model should achieve the following goals: - Support clear boundaries between common "personas" (described more below) - Have an easy-to-understand permission model For the first goal, from my experience working with customers that use Druid security features, users think in terms of "personas" when setting up permissions, for example: - "Datasource reader": needs permissions to query some set of datasources and nothing else - "Data engineer": Someone whose role involves loading data and managing it, they need access to ingestion, compaction, and other related APIs - "Cluster Admin": Highest level of access For the second goal, I think we should keep the permissions model as simple as we can in a way that still supports the first goal. The ideal result I would hope to achieve with those goals is to make it easy for a user to answer a question like "What permissions are needed for a given persona?" Let me know if you think those goals make sense, or if there are other goals that you had in mind. ----- From the perspective above, I like that this proposal introduces more resource types. I have two high level comments on the resources: - We should think about dropping the STATE/CONFIG distinction - It could make sense to condense some of the resource types --- I think it's worth considering whether we want to keep STATE/CONFIG as high level resource categories going forward. In my experience with the current model, there's sometimes significant overlap between STATE and CONFIG, for example, enabling/disabling workers is a CONFIG action, but checking whether a worker is enabled is a STATE action, and so it makes sense to assign those permissions together generally. Another example could be checking the cluster properties, which I feel is arguably both STATE and CONFIG (i.e., the configuration is a kind of state). The users I've worked with have also found that distinction somewhat hard to understand. --- In some cases, I think we could simplify the model by pruning some of the resource types. The example I had in mind was the coordinator load rules; - If the load rules are datasource specific, this could be represented as a SET_LOAD_RULES action on the DATASOURCE resource type instead. - If the load rules are cluster-wide, this could fall under a base admin-esque resource, similar areas might include coordinator dynamic configs related to segment balancing. Structuring the resources that way could also make it easier to enumerate what all the possible actions on a datasource are. Other areas for potential simplification: - Compaction settings, similar to the load rule example - WORKERS and SERVERS (since they're all Druid cluster processes) There might be better approaches to these specific areas, I'm using the examples here more to show my thought process. --- In general, I'm thinking about the permissions with an approach where I'm focusing more on the underlying resources/associated personas+workflows and less on the specific API. Maybe it makes sense to start by defining the "core" resource types and then mapping the various endpoints to those core types. ================================== Additionally, do you have any thoughts on how the new security model should handle input sources for ingestion? Currently, someone who has DATASOURCE WRITE access can submit ingestion tasks with InputSources or Firehoses that can read from any resources that the Druid server is able to access (e.g., S3 buckets). This means you can't use permissions to create different "data engineer" roles that each have different input sources they're allowed to read from. I think it'd be good to enable this somehow, maybe via input source whitelisting. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
