[
https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Na Li updated SENTRY-2539:
--------------------------
Attachment: SENTRY-2539.013.patch
> PolicyEngine should be able to return privilege directly
> ---------------------------------------------------------
>
> Key: SENTRY-2539
> URL: https://issues.apache.org/jira/browse/SENTRY-2539
> Project: Sentry
> Issue Type: Improvement
> Components: Sentry
> Affects Versions: 2.1
> Reporter: Na Li
> Assignee: Na Li
> Priority: Major
> Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch,
> SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch,
> SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch,
> SENTRY-2539.009.patch, SENTRY-2539.010.patch, SENTRY-2539.010.patch,
> SENTRY-2539.013.patch, SENTRY-2539.013.patch, SENTRY-2539.013.patch,
> SENTRY-2539.013.patch
>
>
> *Problem*:
> Right now, for a command such "show databases", Sentry has to perform
> authorization checks on each database. When there are many databases, like
> 12000 databases in the system, the authorization checks of a single command
> in Sentry could be very slow. There are two main factors that slow down
> authorization checks in Sentry even when caching is enabled:
> 1) Cache returns the list of privileges in the form of String. As a result,
> every authorization check has to convert the privilege string to privilege
> object.
> 2) When cache is enabled, the cache returns all privileges of a given user
> regardless what resource to check.
> 2.1) for example, a user has 2000 privileges assigned and the resource to
> check is "server=server1, database=db_1, table=table_1". The cache returns
> all 2000 privileges including unrelated privileges such like
> "server=server1->database=db_2->action=ALL".
> 2.2) Returning unrelated privileges has two side effects:
> 2.2.1) Converting privileges from String to Object overhead is
> proportional to the number of returned privileges from cache. Converting
> unrelated privileges cost time, but no benefit.
> 2.2.2) Authorization check goes through each privilege, and its overhead
> is proportional to the number of returned privileges from cache. Converting
> unrelated privileges cost time, but no benefit.
> *Solution*:
> 1) Add a new function listPrivilegeObjects that lets authorization provider
> get privilege objects when checking the authorization. This avoids the
> conversion overhead. All the interfaces from policy engine (PolicyEngine) to
> the cache (PrivilegeCache) have to be changed to add this new function.
> 2) Implement a new cache TreePrivilegeCache. It converts the privilege from
> String format to Privilege object at beginning, and directly return the
> privilege objects in listPrivilegeObjects at authorization check. This avoids
> the overhead of conversion at each authorization check.
> 3) TreePrivilegeCache organizes the privileges based on the resource
> hierarchy, like a tree. Therefore, it can return only related privileges
> based on the resource to check. This reduces the authorization check
> overhead.
> 3.1) For example, a user has 2000 privileges assigned, and the resource to
> check is "server=server1, database=db_1, table=table_1". the cache
> TreePrivilegeCache returns only related privileges excluding unrelated
> privileges such like "server=server1->database=db_2->action=ALL".
> 3.2) SENTRY-1291 was to address the problem 2). However, it did not address
> the problem 1). And its implementation SimplePrivilegeCache is not memory
> efficient (the key of the map contains the whole resource hierarchy, and many
> keys share large portion of the same content), nor operational efficient (for
> each authorization check, SimplePrivilegeCache .listPrivileges() has to
> construct a large amount of keys in order to find all related privileges in a
> map).
> 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note,
> this solution is built on top of SENTRY-1291, and utilizes the changes
> SENTRY-1291 made, such as providing resource hierarchy when getting
> privileges for authorization check.
>
> *Major Behavior Change*
> 1) Create a new Interface FilteredPrivilegeCache, which extends from
> PrivilegeCache.
> 2) Move the function added by SENTRY-1291 in PrivilegeCache to
> FilteredPrivilegeCache. Add additional functions in this solution to
> FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache,
> and we are backward compatible with old implementation before SENTRY-1291.
> 3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from
> SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements
> FilteredPrivilegeCache.
> 4) Instead of hard-coding the privilege cache class, use configuration
> AuthzConfVars.AUTHZ_PRIVILEGE_CACHE ("sentry.hive.privilege.cache") to
> specify the privilege cache class name. The default value is
> "org.apache.sentry.provider.cache.TreePrivilegeCache". User can change to
> another cache implementation in sentry-site.xml at a service (such as hive
> server or HMS). The options are
> 4.1) org.apache.sentry.provider.cache.SimplePrivilegeCache (the original
> cache implementation before SENTRY-1291)
> 4.2) org.apache.sentry.provider.cache.SimpleFilteredPrivilegeCache (the
> cache implemented in SENTRY-1291)
> 4.3) org.apache.sentry.provider.cache.TreePrivilegeCache (the cache
> implemented in this Jira SENTRY-2539)
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)