[jira] [Updated] (SENTRY-2539) PolicyEngine should be able to return privilege directly

2019-12-21 Thread Na Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated SENTRY-2539:
--
Attachment: SENTRY-2539.013.patch

> PolicyEngine  should be able to return privilege directly
> -
>
> Key: SENTRY-2539
> URL: https://issues.apache.org/jira/browse/SENTRY-2539
> Project: Sentry
>  Issue Type: Improvement
>  Components: Sentry
>Affects Versions: 2.1
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, 
> SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, 
> SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch, 
> SENTRY-2539.009.patch, SENTRY-2539.010.patch, SENTRY-2539.010.patch, 
> SENTRY-2539.013.patch, SENTRY-2539.013.patch
>
>
> *Problem*:
> Right now, for a command such "show databases", Sentry has to perform 
> authorization checks on each database. When there are many databases, like 
> 12000 databases in the system, the authorization checks of a single command 
> in Sentry could be very slow. There are two main factors that slow down 
> authorization checks in Sentry even when caching is enabled:
> 1) Cache returns the list of privileges in the form of String. As a result, 
> every authorization check has to convert the privilege string to privilege 
> object.
> 2) When cache is enabled, the cache returns all privileges of a given user 
> regardless what resource to check.
>   2.1) for example, a user has 2000 privileges assigned and the resource to 
> check is "server=server1, database=db_1, table=table_1". The cache returns 
> all 2000 privileges including unrelated privileges such like 
> "server=server1->database=db_2->action=ALL". 
>   2.2) Returning unrelated privileges has two side effects:
>     2.2.1) Converting privileges from String to Object overhead is 
> proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
>     2.2.2) Authorization check goes through each privilege, and its overhead 
> is proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
> *Solution*:
> 1) Add a new function listPrivilegeObjects that lets authorization provider 
> get privilege objects when checking the authorization. This avoids the 
> conversion overhead. All the interfaces from policy engine (PolicyEngine) to 
> the cache (PrivilegeCache) have to be changed to add this new function. 
> 2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
> String format to Privilege object at beginning, and directly return the 
> privilege objects in listPrivilegeObjects at authorization check. This avoids 
> the overhead of conversion at each authorization check. 
> 3) TreePrivilegeCache organizes the privileges based on the resource 
> hierarchy, like a tree. Therefore, it can return only related privileges 
> based on the resource to check. This reduces the authorization check 
> overhead. 
>   3.1) For example, a user has 2000 privileges assigned, and the resource to 
> check is "server=server1, database=db_1, table=table_1". the cache 
> TreePrivilegeCache returns only related privileges excluding unrelated 
> privileges such like "server=server1->database=db_2->action=ALL". 
>   3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
> the problem 1). And its implementation SimplePrivilegeCache is not memory 
> efficient (the key of the map contains the whole resource hierarchy, and many 
> keys share large portion of the same content), nor operational efficient (for 
> each authorization check, SimplePrivilegeCache .listPrivileges() has to 
> construct a large amount of keys in order to find all related privileges in a 
> map). 
> 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
> this solution is built on top of SENTRY-1291, and utilizes the changes 
> SENTRY-1291 made, such as providing resource hierarchy when getting 
> privileges for authorization check.
>  
> *Major Behavior Change*
> 1) Create a new Interface FilteredPrivilegeCache, which extends from 
> PrivilegeCache.
> 2) Move the function added by SENTRY-1291 in PrivilegeCache to 
> FilteredPrivilegeCache. Add additional functions in this solution to 
> FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, 
> and we are backward compatible with old implementation before SENTRY-1291.
> 3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from  
> SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements  
> FilteredPrivilegeCache. 
> 4) Instead of hard-coding the privilege cache class, use configuration 
> AuthzConfVars.AUTHZ_PRIVILEGE_CACHE 

[jira] [Updated] (SENTRY-2539) PolicyEngine should be able to return privilege directly

2019-12-21 Thread Na Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated SENTRY-2539:
--
Attachment: SENTRY-2539.013.patch

> PolicyEngine  should be able to return privilege directly
> -
>
> Key: SENTRY-2539
> URL: https://issues.apache.org/jira/browse/SENTRY-2539
> Project: Sentry
>  Issue Type: Improvement
>  Components: Sentry
>Affects Versions: 2.1
>Reporter: Na Li
>Assignee: Na Li
>Priority: Major
> Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, 
> SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, 
> SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch, 
> SENTRY-2539.009.patch, SENTRY-2539.010.patch, SENTRY-2539.010.patch, 
> SENTRY-2539.013.patch, SENTRY-2539.013.patch
>
>
> *Problem*:
> Right now, for a command such "show databases", Sentry has to perform 
> authorization checks on each database. When there are many databases, like 
> 12000 databases in the system, the authorization checks of a single command 
> in Sentry could be very slow. There are two main factors that slow down 
> authorization checks in Sentry even when caching is enabled:
> 1) Cache returns the list of privileges in the form of String. As a result, 
> every authorization check has to convert the privilege string to privilege 
> object.
> 2) When cache is enabled, the cache returns all privileges of a given user 
> regardless what resource to check.
>   2.1) for example, a user has 2000 privileges assigned and the resource to 
> check is "server=server1, database=db_1, table=table_1". The cache returns 
> all 2000 privileges including unrelated privileges such like 
> "server=server1->database=db_2->action=ALL". 
>   2.2) Returning unrelated privileges has two side effects:
>     2.2.1) Converting privileges from String to Object overhead is 
> proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
>     2.2.2) Authorization check goes through each privilege, and its overhead 
> is proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
> *Solution*:
> 1) Add a new function listPrivilegeObjects that lets authorization provider 
> get privilege objects when checking the authorization. This avoids the 
> conversion overhead. All the interfaces from policy engine (PolicyEngine) to 
> the cache (PrivilegeCache) have to be changed to add this new function. 
> 2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
> String format to Privilege object at beginning, and directly return the 
> privilege objects in listPrivilegeObjects at authorization check. This avoids 
> the overhead of conversion at each authorization check. 
> 3) TreePrivilegeCache organizes the privileges based on the resource 
> hierarchy, like a tree. Therefore, it can return only related privileges 
> based on the resource to check. This reduces the authorization check 
> overhead. 
>   3.1) For example, a user has 2000 privileges assigned, and the resource to 
> check is "server=server1, database=db_1, table=table_1". the cache 
> TreePrivilegeCache returns only related privileges excluding unrelated 
> privileges such like "server=server1->database=db_2->action=ALL". 
>   3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
> the problem 1). And its implementation SimplePrivilegeCache is not memory 
> efficient (the key of the map contains the whole resource hierarchy, and many 
> keys share large portion of the same content), nor operational efficient (for 
> each authorization check, SimplePrivilegeCache .listPrivileges() has to 
> construct a large amount of keys in order to find all related privileges in a 
> map). 
> 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
> this solution is built on top of SENTRY-1291, and utilizes the changes 
> SENTRY-1291 made, such as providing resource hierarchy when getting 
> privileges for authorization check.
>  
> *Major Behavior Change*
> 1) Create a new Interface FilteredPrivilegeCache, which extends from 
> PrivilegeCache.
> 2) Move the function added by SENTRY-1291 in PrivilegeCache to 
> FilteredPrivilegeCache. Add additional functions in this solution to 
> FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, 
> and we are backward compatible with old implementation before SENTRY-1291.
> 3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from  
> SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements  
> FilteredPrivilegeCache. 
> 4) Instead of hard-coding the privilege cache class, use configuration 
> AuthzConfVars.AUTHZ_PRIVILEGE_CACHE