Re: Re: [DISCUSS] Per User Access Controls

Ted Dunning Thu, 13 Jan 2022 23:09:12 -0800

GRANT and REVOKE implicitly assumes that the database is king of access
control. That works when the database owns the data.


In the modern world where data storage is separated from query, it is truly
painful to have to manage permissions for each analysis and each query tool
and nearly impossible to keep them synchronized. Likewise, it is impossible
to get plugins for systems like Ranger for all possible tools and
impossible for Ranger to even understand all tools.

For instance, suppose you have S3 data, files and database. Each has
permissions already defined. Now you have users who want to use Drill (for
SQL processing), Jupyter notebooks with Python for data engineering, Julia
with Pluto notebooks for numerical work and batchwise Spark jobs all for
building data pipelines across all the kinds of data. Neither Python, Julia
nor Spark can really be protected by Ranger. All assume file permissions or
S3 IAMs do that job.



On Thu, Jan 13, 2022 at 10:49 PM Z0ltrix <[email protected]> wrote:

> Hi @All,
>
> for me, that uses Drill with a kerberized hadoop cluster and Ranger as
> central Access-Control-System i would love to have an Ranger-Plugin for
> Drill, but i would assume a lot Drill users just spins up a cluster in
> front of S3 or azure.
>
> So why not using a generic approach with GRANT and REVOKE for users and
> groups on specific workspaces, or at least storage plugins?
>
> With that an admin can control which users and groups can access all
> storage plugins we have, no matter if the underneath plugin has such a
> system.
>
>
> Maybe we could use the Metastore to store such information?
>
> Regards,
> Christian
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> Paul Rogers <[email protected]> schrieb am Donnerstag, 13. Januar 2022 um
> 23:40:
>
> > Hey All,
> >
>
> > Other members of the Hadoop Ecosystem rely on external systems to handle
> >
>
> > permissions: Ranger or Sentry. There is probably something different in
> the
> >
>
> > AWS world.
> >
>
> > As you look into security, you'll see that you need to maintain
> permissions
> >
>
> > on many entities: files, connections, etc. You need different
> permissions:
> >
>
> > read, write, create, etc. In larger groups of people, you need roles:
> admin
> >
>
> > role, sales analyst role, production engineer role. Users map to roles,
> and
> >
>
> > roles take permissions.
> >
>
> > Creating this just for Drill is not effective: no one wants to learn a
> >
>
> > Drill "Security Store" any more than folks want to learn the "Drill
> >
>
> > metastore". Drill is seldom the only tool in a shop: people want to set
> >
>
> > permissions in one place, not in each tool. So, we should integrate with
> >
>
> > existing tools.
> >
>
> > Drill should provide an API, and be prepared to enforce rules. Drill
> >
>
> > defines the entities that can be secured, and the available permissions.
> >
>
> > Then, it is up to an external system to provide user identity, take
> tuples
> >
>
> > of (user, resource, permission) and return a boolean of whether that user
> >
>
> > is authorized or not. MapR, Pam, Hadoop and other systems would be
> >
>
> > implemented on top of the Drill permissions API, as would whatever need
> you
> >
>
> > happen to have.
> >
>
> > Thanks,
> >
>
> > -   Paul
> >
>
> >     On Thu, Jan 13, 2022 at 12:32 PM Curtis Lambert
> [email protected]
> >
>
> > wrote:
> >
>
> > > This is what we are handling with Vault outside of Drill, combined with
> > >
>
> > > aliasing. James is tracking some of what you've been finding with the
> > >
>
> > > credential store but even then we want the single source of auth. We
> can
> > >
>
> > > chat with James on the next Drill stand up (and anyone else who wants
> to
> > >
>
> > > feel the pain).
> > >
>
> > > [image: avatar]
> > >
>
> > > Curtis Lambert
> > >
>
> > > CTO
> > >
>
> > > Email:
> > >
>
> > > [email protected] [email protected]
> > >
>
> > > Phone:
> > >
>
> > > -   706-402-0249
> > >
>
> > >     [image: LinkedIn]LinkedIn
> > >
>
> > >     https://www.linkedin.com/in/curtis-lambert-2009b2141/ [image:
> Calendly]
> > >
>
> > >     Calendly https://calendly.com/curtis283/generic-zoom
> > >
>
> > >     [image: Data Distillr logo] https://www.datadistillr.com/
> > >
>
> > > On Thu, Jan 13, 2022 at 3:29 PM Charles Givre [email protected] wrote:
> > >
>
> > > > Hello all,
> > > >
>
> > > > One of the issues we've been dancing around is having per-user access
> > > >
>
> > > > controls in Drill. As Drill was originally built around the Hadoop
> > > >
>
> > > > ecosystem, the Hadoop based connections make use of
> user-impersonation
> > > >
>
> > > > for
> > > >
>
> > > > per user access controls. However, a rather glaring deficiency is the
> > > >
>
> > > > lack
> > > >
>
> > > > of per-user access controls for connections like JDBC, Mongo, Splunk
> etc.
> > > >
>
> > > > Recently when I was working on OAuth pull request, it occurred to me
> that
> > > >
>
> > > > we might be able to slightly extend the credential provider
> interface to
> > > >
>
> > > > allow for per-user credentials. Here's what I was thinking...
> > > >
>
> > > > A bit of background: The credential provider interface is really an
> > > >
>
> > > > abstraction for a HashMap. Here's my proposal.... The cred provider
> > > >
>
> > > > interface would store two hashmaps, one for per-user creds and one
> for
> > > >
>
> > > > global creds. When a user is authenticated to Drill, when they
> create a
> > > >
>
> > > > storage plugin connection, the credential provider would associate
> the
> > > >
>
> > > > creds with their Drill username. The storage plugins that use
> credential
> > > >
>
> > > > provider would thus get per-user credentials.
> > > >
>
> > > > If users did not want per-user credentials, they could simply use
> direct
> > > >
>
> > > > credentials OR use specify that in the credential provider classes.
> What
> > > >
>
> > > > do you think?
> > > >
>
> > > > Best,
> > > >
>
> > > > -- C

Re: Re: [DISCUSS] Per User Access Controls

Reply via email to