Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13069

to look at the new patch set (#5).

Change subject: [authz] new SentryAuthzProvider's caching strategy
......................................................................

[authz] new SentryAuthzProvider's caching strategy

This patch updates the way how the privilege cache in
SentryAuthzProvider is populated.  Prior to this patch, only one entry
per sanitized Sentry's response was created.  With this patch,
a response may be split into two entries: one contains server- and
database-scope privileges, and another contains table- and column-scope
privileges.  Of course, it also changes the lookup process: now it's
necessary to search for two entries in the cache if looking up for
an entry with privileges for an authorizable of the table scope.

The new caching strategy leverages the fact that Sentry includes
information on privileges granted on authorizables of higher scopes
in the hierarchy, if any.  The new strategy is beneficial in cases
when a user has privileges granted on database.  In that case, once
there was a request to authorize an action on a table or a column
of that table, next request to authorize an action on the database
itself will hit the cache, avoiding an extra RPC sent to Sentry.
Another example that benefits from the new caching scheme are
scenarios like AuthorizeDropTable(tableA) followed by
AuthorizeCreateTable(table), where 'table' is 'tableA' or another
table.

See below for more details.

The API that Kudu uses to fetch privileges from Sentry returns
privileges corresponding to all ancestors and descendants of the
requested authorizable in Sentry's scope hierarchy tree.  For instance,
if requesting database-level privileges on "s.d", the API returns
privileges on "s", "s.d", privileges on all tables within "s.d",
and privileges on all columns therein if any of those privileges are
granted.  In case of many tables in a single database we would end up
caching information for all of them, while among those tables there
might be many of non-Kudu tables.

Additionally, the existing cache-lookup policy doesn't allow for
leveraging of the information that exist in the cache where it could.
For example, with prior strategy the following might be stored in
the cache:

  { /s/d/t => [ "METADATA" on /s,
                "CREATE"   on /s/d,
                "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

At the same time, a cache lookup for privileges on "/s/d" would yield an
expensive database-scoped request to Sentry (expensive in terms of
amount of data returned by Sentry in case of database with many tables).
If we were smarter about how we structured our privileges in cache,
we could return the cached database-scoped privilege.

This patch addresses these problems by splitting the cached privileges
into at most two entries: those for database-scope and above (keyed
by database), and those for table-scope and below (keyed by table).
For cache lookups for a given authorizable, this allows us to use
existing privileges for higher scopes from other authorizables where
possible, avoiding expensive calls to Sentry for database-scoped
privileges.  For the example above, with the new caching strategy,
the cache will store the following entries:

  { /s/d   => [ "METADATA" on /s,
                "CREATE"   on /s/d, ] }

  { /s/d/t => [ "SELECT"   on /s/d/t,
                "UPDATE"   on /s/d/t/c, ] }

With the above in cache, a lookup on /s/d would thus be able to leverage
the existing entry.  The trade-off is that:
  * every Sentry's response on table-scope privileges is stored as
    two entries in the cache: one for database-scope and another
    one for table-scope
  * fetching of table-scope information require two cache lookups
    (for example above one for /s/d and another for /s/d/t key)

Why not to split a response further, using 'one entry per each
authz-scope' approach?  Based on current privilege model, Kudu requests
privileges at the database scope (e.g. for CreateTable) or table scope
(e.g. OpenTable) only, so this separation seems natural while limiting
the number of cache entries to create/lookup.

To address the concern about caching irrelevant information on non-Kudu
tables sent by Sentry with a _database-scope_ response, with the new
caching strategy the information on table- and column-scope
authorizables is chopped off from database-scope responses before
storing corresponding database-level entry in the cache.  That's also
necessary from the point of keeping information in the cache consistent
while avoiding splitting a single Sentry response into unlimited number
of entries in the cache.  Otherwise, it will be necessary to put
information on each table-scope sub-tree into a separate cache entry
to leverage the presence of the whole database sub-tree in a Sentry's
response.

Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
---
M src/kudu/master/sentry_authz_provider-test.cc
M src/kudu/master/sentry_privileges_fetcher.cc
M src/kudu/master/sentry_privileges_fetcher.h
3 files changed, 324 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/5
--
To view, visit http://gerrit.cloudera.org:8080/13069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608
Gerrit-Change-Number: 13069
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to