Hi

Drill has supported impersonation, which I'll use here to mean any mechanism by which Drill accesses an external system as the end user rather than its own OS user, to certain types of storage for a long time.  HDFS and Hive are the oldest examples, Phoenix is the most recent.

The myriad external systems for which we have a plugin have not taken any universally consistent approach to supporting impersonation, or to their security in general.  In many cases that are not in the Hadoop-secured-with-Kerberos stable, no impersonation support is present at all and the only possibility left to Drill is for it to connect with a full set of end user credentials, presumably obtained from the credential provider subsystem.  Generic plugins, like JDBC, can only assume a common denominator and end up in the same situation.

The Phoenix storage plugin, which is JDBC based but secured with Kerberos, finds itself kind of half way along this spectrum in that Phoenix does have an impersonation mechanism, but Drill must nevertheless establish separate JDBC connections for each user, including their username in the URL, in order to make use of it.  By adding support for doing this, the Phoenix plugin established a precedent for a design that uses an end user credential, even if it's only the username in this instance, to create or retrieve a outbound connection made specifically for that user.  Note that the plugin itself behaves like any other, it just comes with some connection management smarts.

So far so good, we have these three merged and working plugins that can do impersonation.  There have also been other, not yet merged, efforts to achieve the same end goal.

DRILL-7871 StoragePluginStore instances for different users (#2251)
DRILL-8121 Add UserSession to StoragePlugins (#2445)

I only understand so much of them, but I believe that they respectively replicate the whole storage plugin registry per user, and make the global storage plugin registry user-aware so that it creates storage plugin instances per user.

The school of thought that I currently subscribe to is that the precedent established by the HDFS, Hive and Phoenix plugins (smarter plugins that can juggle identities) can be harmonised with the rest of Drill while I worry that these latter two efforts might not easily.  However, it may simply be that I cannot see enough to realise that they also present paths that will lead to us to a happy place in the future.

Thanks for reading all of that - what are your thoughts?

Regards
James

Reply via email to