Hi
Drill has supported impersonation, which I'll use here to mean any
mechanism by which Drill accesses an external system as the end user
rather than its own OS user, to certain types of storage for a long
time. HDFS and Hive are the oldest examples, Phoenix is the most recent.
The myriad external systems for which we have a plugin have not taken
any universally consistent approach to supporting impersonation, or to
their security in general. In many cases that are not in the
Hadoop-secured-with-Kerberos stable, no impersonation support is present
at all and the only possibility left to Drill is for it to connect with
a full set of end user credentials, presumably obtained from the
credential provider subsystem. Generic plugins, like JDBC, can only
assume a common denominator and end up in the same situation.
The Phoenix storage plugin, which is JDBC based but secured with
Kerberos, finds itself kind of half way along this spectrum in that
Phoenix does have an impersonation mechanism, but Drill must
nevertheless establish separate JDBC connections for each user,
including their username in the URL, in order to make use of it. By
adding support for doing this, the Phoenix plugin established a
precedent for a design that uses an end user credential, even if it's
only the username in this instance, to create or retrieve a outbound
connection made specifically for that user. Note that the plugin itself
behaves like any other, it just comes with some connection management
smarts.
So far so good, we have these three merged and working plugins that can
do impersonation. There have also been other, not yet merged, efforts
to achieve the same end goal.
DRILL-7871 StoragePluginStore instances for different users (#2251)
DRILL-8121 Add UserSession to StoragePlugins (#2445)
I only understand so much of them, but I believe that they respectively
replicate the whole storage plugin registry per user, and make the
global storage plugin registry user-aware so that it creates storage
plugin instances per user.
The school of thought that I currently subscribe to is that the
precedent established by the HDFS, Hive and Phoenix plugins (smarter
plugins that can juggle identities) can be harmonised with the rest of
Drill while I worry that these latter two efforts might not easily.
However, it may simply be that I cannot see enough to realise that they
also present paths that will lead to us to a happy place in the future.
Thanks for reading all of that - what are your thoughts?
Regards
James