Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13069 to look at the new patch set (#5). Change subject: [authz] new SentryAuthzProvider's caching strategy ...................................................................... [authz] new SentryAuthzProvider's caching strategy This patch updates the way how the privilege cache in SentryAuthzProvider is populated. Prior to this patch, only one entry per sanitized Sentry's response was created. With this patch, a response may be split into two entries: one contains server- and database-scope privileges, and another contains table- and column-scope privileges. Of course, it also changes the lookup process: now it's necessary to search for two entries in the cache if looking up for an entry with privileges for an authorizable of the table scope. The new caching strategy leverages the fact that Sentry includes information on privileges granted on authorizables of higher scopes in the hierarchy, if any. The new strategy is beneficial in cases when a user has privileges granted on database. In that case, once there was a request to authorize an action on a table or a column of that table, next request to authorize an action on the database itself will hit the cache, avoiding an extra RPC sent to Sentry. Another example that benefits from the new caching scheme are scenarios like AuthorizeDropTable(tableA) followed by AuthorizeCreateTable(table), where 'table' is 'tableA' or another table. See below for more details. The API that Kudu uses to fetch privileges from Sentry returns privileges corresponding to all ancestors and descendants of the requested authorizable in Sentry's scope hierarchy tree. For instance, if requesting database-level privileges on "s.d", the API returns privileges on "s", "s.d", privileges on all tables within "s.d", and privileges on all columns therein if any of those privileges are granted. In case of many tables in a single database we would end up caching information for all of them, while among those tables there might be many of non-Kudu tables. Additionally, the existing cache-lookup policy doesn't allow for leveraging of the information that exist in the cache where it could. For example, with prior strategy the following might be stored in the cache: { /s/d/t => [ "METADATA" on /s, "CREATE" on /s/d, "SELECT" on /s/d/t, "UPDATE" on /s/d/t/c, ] } At the same time, a cache lookup for privileges on "/s/d" would yield an expensive database-scoped request to Sentry (expensive in terms of amount of data returned by Sentry in case of database with many tables). If we were smarter about how we structured our privileges in cache, we could return the cached database-scoped privilege. This patch addresses these problems by splitting the cached privileges into at most two entries: those for database-scope and above (keyed by database), and those for table-scope and below (keyed by table). For cache lookups for a given authorizable, this allows us to use existing privileges for higher scopes from other authorizables where possible, avoiding expensive calls to Sentry for database-scoped privileges. For the example above, with the new caching strategy, the cache will store the following entries: { /s/d => [ "METADATA" on /s, "CREATE" on /s/d, ] } { /s/d/t => [ "SELECT" on /s/d/t, "UPDATE" on /s/d/t/c, ] } With the above in cache, a lookup on /s/d would thus be able to leverage the existing entry. The trade-off is that: * every Sentry's response on table-scope privileges is stored as two entries in the cache: one for database-scope and another one for table-scope * fetching of table-scope information require two cache lookups (for example above one for /s/d and another for /s/d/t key) Why not to split a response further, using 'one entry per each authz-scope' approach? Based on current privilege model, Kudu requests privileges at the database scope (e.g. for CreateTable) or table scope (e.g. OpenTable) only, so this separation seems natural while limiting the number of cache entries to create/lookup. To address the concern about caching irrelevant information on non-Kudu tables sent by Sentry with a _database-scope_ response, with the new caching strategy the information on table- and column-scope authorizables is chopped off from database-scope responses before storing corresponding database-level entry in the cache. That's also necessary from the point of keeping information in the cache consistent while avoiding splitting a single Sentry response into unlimited number of entries in the cache. Otherwise, it will be necessary to put information on each table-scope sub-tree into a separate cache entry to leverage the presence of the whole database sub-tree in a Sentry's response. Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608 --- M src/kudu/master/sentry_authz_provider-test.cc M src/kudu/master/sentry_privileges_fetcher.cc M src/kudu/master/sentry_privileges_fetcher.h 3 files changed, 324 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/13069/5 -- To view, visit http://gerrit.cloudera.org:8080/13069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id96181345e357a104e28314d8d8d88633dcf9608 Gerrit-Change-Number: 13069 Gerrit-PatchSet: 5 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120)