Thanks to everyone who shared the thoughtful observations about security architecture.  This email is just to add some specific feedback on the credential provider proposal below.

Only the in-memory PlainCredentialsProvider wraps a Map of credentials.  The other implementations only ever construct such Maps on the fly from their backing store upon receiving a call to getCredentials().  Nevertheless, every provider could in principle be augmented with an additional in-memory store in the form of a new member Map, this being the second such Map in the case of PlainCredentialsProvider.

Now the hope is that the proposed additional credentials Map will bring support for user-scoped credentials to the credential providers, let's work through an example to see what happens. Imagine a Drill environment with two users, Alice being an admin that can create storage configs.  Alice logs in and creates a storage config called "postgresql".  She must capture persistent credentials for "postgresql" in one of the following supported places: inline in the JSON, in env vars on the server, in the Hadoop conf.xml on the server, or in HashiCorp Vault.  Drill doesn't write to any of those places on its own so she has to write to the relevant store directly herself.

Crucially, only one set of credentials for "postgresql" can be captured in any of the listed persistent stores.  It would not help if the creds provider impl, which currently does not participate in storage config creation all, could also record Alice's credentials to a new in-memory Map which remembers that they belong to Alice. When the Drillbit is restarted, the single set of persistent credentials for "postgresql" will be read back in leaving Bob with no place to persist his own "postgresql" credentials.

Even if we imagine a creds provider impl that correctly persists and returns credentials specific to the active Drill user, storage plugins themselves would need to change.  Instead of obtaining credentials and establishing connections out during their initialisation, they would need to re-obtain credentials on every new query and check to see whether they already have a connection out for those credentials, or need to establish a new one.

These things aren't impossible but the changes run deeper than adding a Map and a little logic to the creds providers.


On 2022/01/13 22:29, Charles Givre wrote:
Hello all,
One of the issues we've been dancing around is having per-user access controls 
in Drill.  As Drill was originally built around the Hadoop ecosystem, the 
Hadoop based connections make use of user-impersonation for per user access 
controls.  However, a rather glaring deficiency is the lack of per-user access 
controls for connections like JDBC, Mongo, Splunk etc.

Recently when I was working on OAuth pull request, it occurred to me that we 
might be able to slightly extend the credential provider interface to allow for 
per-user credentials.  Here's what I was thinking...

A bit of background:  The credential provider interface is really an 
abstraction for a HashMap.  Here's my proposal.... The cred provider interface 
would store two hashmaps, one for per-user creds and one for global creds.   
When a user is authenticated to Drill, when they create a storage plugin 
connection, the credential provider would associate the creds with their Drill 
username.  The storage plugins that use credential provider would thus get 
per-user credentials.

If users did not want per-user credentials, they could simply use direct 
credentials OR use specify that in the credential provider classes.  What do 
you think?

Best,
-- C


Reply via email to