Thanks to everyone who shared the thoughtful observations about security
architecture. This email is just to add some specific feedback on the
credential provider proposal below.
Only the in-memory PlainCredentialsProvider wraps a Map of credentials.
The other implementations only ever construct such Maps on the fly from
their backing store upon receiving a call to getCredentials().
Nevertheless, every provider could in principle be augmented with an
additional in-memory store in the form of a new member Map, this being
the second such Map in the case of PlainCredentialsProvider.
Now the hope is that the proposed additional credentials Map will bring
support for user-scoped credentials to the credential providers, let's
work through an example to see what happens. Imagine a Drill environment
with two users, Alice being an admin that can create storage configs.
Alice logs in and creates a storage config called "postgresql". She
must capture persistent credentials for "postgresql" in one of the
following supported places: inline in the JSON, in env vars on the
server, in the Hadoop conf.xml on the server, or in HashiCorp Vault.
Drill doesn't write to any of those places on its own so she has to
write to the relevant store directly herself.
Crucially, only one set of credentials for "postgresql" can be captured
in any of the listed persistent stores. It would not help if the creds
provider impl, which currently does not participate in storage config
creation all, could also record Alice's credentials to a new in-memory
Map which remembers that they belong to Alice. When the Drillbit is
restarted, the single set of persistent credentials for "postgresql"
will be read back in leaving Bob with no place to persist his own
"postgresql" credentials.
Even if we imagine a creds provider impl that correctly persists and
returns credentials specific to the active Drill user, storage plugins
themselves would need to change. Instead of obtaining credentials and
establishing connections out during their initialisation, they would
need to re-obtain credentials on every new query and check to see
whether they already have a connection out for those credentials, or
need to establish a new one.
These things aren't impossible but the changes run deeper than adding a
Map and a little logic to the creds providers.
On 2022/01/13 22:29, Charles Givre wrote:
Hello all,
One of the issues we've been dancing around is having per-user access controls
in Drill. As Drill was originally built around the Hadoop ecosystem, the
Hadoop based connections make use of user-impersonation for per user access
controls. However, a rather glaring deficiency is the lack of per-user access
controls for connections like JDBC, Mongo, Splunk etc.
Recently when I was working on OAuth pull request, it occurred to me that we
might be able to slightly extend the credential provider interface to allow for
per-user credentials. Here's what I was thinking...
A bit of background: The credential provider interface is really an
abstraction for a HashMap. Here's my proposal.... The cred provider interface
would store two hashmaps, one for per-user creds and one for global creds.
When a user is authenticated to Drill, when they create a storage plugin
connection, the credential provider would associate the creds with their Drill
username. The storage plugins that use credential provider would thus get
per-user credentials.
If users did not want per-user credentials, they could simply use direct
credentials OR use specify that in the credential provider classes. What do
you think?
Best,
-- C