[ https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001309#comment-17001309 ]
Hadoop QA commented on SENTRY-2539: ----------------------------------- Here are the results of testing the latest attachment https://issues.apache.org/jira/secure/attachment/12989299/SENTRY-2539.010.patch against master. {color:green}Overall:{color} +1 all checks pass {color:green}SUCCESS:{color} all tests passed Console output: https://builds.apache.org/job/PreCommit-SENTRY-Build/4465/console This message is automatically generated. > PolicyEngine should be able to return privilege directly > --------------------------------------------------------- > > Key: SENTRY-2539 > URL: https://issues.apache.org/jira/browse/SENTRY-2539 > Project: Sentry > Issue Type: Improvement > Components: Sentry > Affects Versions: 2.1 > Reporter: Na Li > Assignee: Na Li > Priority: Major > Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, > SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, > SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch, > SENTRY-2539.009.patch, SENTRY-2539.010.patch, SENTRY-2539.010.patch > > > *Problem*: > Right now, for a command such "show databases", Sentry has to perform > authorization checks on each database. When there are many databases, like > 12000 databases in the system, the authorization checks of a single command > in Sentry could be very slow. There are two main factors that slow down > authorization checks in Sentry even when caching is enabled: > 1) Cache returns the list of privileges in the form of String. As a result, > every authorization check has to convert the privilege string to privilege > object. > 2) When cache is enabled, the cache returns all privileges of a given user > regardless what resource to check. > 2.1) for example, a user has 2000 privileges assigned and the resource to > check is "server=server1, database=db_1, table=table_1". The cache returns > all 2000 privileges including unrelated privileges such like > "server=server1->database=db_2->action=ALL". > 2.2) Returning unrelated privileges has two side effects: > 2.2.1) Converting privileges from String to Object overhead is > proportional to the number of returned privileges from cache. Converting > unrelated privileges cost time, but no benefit. > 2.2.2) Authorization check goes through each privilege, and its overhead > is proportional to the number of returned privileges from cache. Converting > unrelated privileges cost time, but no benefit. > *Solution*: > 1) Add a new function listPrivilegeObjects that lets authorization provider > get privilege objects when checking the authorization. This avoids the > conversion overhead. All the interfaces from policy engine (PolicyEngine) to > the cache (PrivilegeCache) have to be changed to add this new function. > 2) Implement a new cache TreePrivilegeCache. It converts the privilege from > String format to Privilege object at beginning, and directly return the > privilege objects in listPrivilegeObjects at authorization check. This avoids > the overhead of conversion at each authorization check. > 3) TreePrivilegeCache organizes the privileges based on the resource > hierarchy, like a tree. Therefore, it can return only related privileges > based on the resource to check. This reduces the authorization check > overhead. > 3.1) For example, a user has 2000 privileges assigned, and the resource to > check is "server=server1, database=db_1, table=table_1". the cache > TreePrivilegeCache returns only related privileges excluding unrelated > privileges such like "server=server1->database=db_2->action=ALL". > 3.2) SENTRY-1291 was to address the problem 2). However, it did not address > the problem 1). And its implementation SimplePrivilegeCache is not memory > efficient (the key of the map contains the whole resource hierarchy, and many > keys share large portion of the same content), nor operational efficient (for > each authorization check, SimplePrivilegeCache .listPrivileges() has to > construct a large amount of keys in order to find all related privileges in a > map). > 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, > this solution is built on top of SENTRY-1291, and utilizes the changes > SENTRY-1291 made, such as providing resource hierarchy when getting > privileges for authorization check. > > *Major Behavior Change* > 1) Create a new Interface FilteredPrivilegeCache, which extends from > PrivilegeCache. > 2) Move the function added by SENTRY-1291 in PrivilegeCache to > FilteredPrivilegeCache. Add additional functions in this solution to > FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, > and we are backward compatible with old implementation before SENTRY-1291. > 3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from > SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements > FilteredPrivilegeCache. > 4) Instead of hard-coding the privilege cache class, use configuration > AuthzConfVars.AUTHZ_PRIVILEGE_CACHE ("sentry.hive.privilege.cache") to > specify the privilege cache class name. The default value is > "org.apache.sentry.provider.cache.TreePrivilegeCache". User can change to > another cache implementation in sentry-site.xml at a service (such as hive > server or HMS). The options are > 4.1) org.apache.sentry.provider.cache.SimplePrivilegeCache (the original > cache implementation before SENTRY-1291) > 4.2) org.apache.sentry.provider.cache.SimpleFilteredPrivilegeCache (the > cache implemented in SENTRY-1291) > 4.3) org.apache.sentry.provider.cache.TreePrivilegeCache (the cache > implemented in this Jira SENTRY-2539) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)