[ 
https://issues.apache.org/jira/browse/SENTRY-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Li updated SENTRY-2539:
--------------------------
    Description: 
*Problem*:

Right now, for a command such "show databases", Sentry has to perform 
authorization checks on each database. When there are many databases, like 
12000 databases in the system, the authorization checks of a single command in 
Sentry could be very slow. There are two main factors that slow down 
authorization checks in Sentry even when caching is enabled:

1) Cache returns the list of privileges in the form of String. As a result, 
every authorization check has to convert the privilege string to privilege 
object.

2) When cache is enabled, the cache returns all privileges of a given user 
regardless what resource to check.

  2.1) for example, a user has 2000 privileges assigned and the resource to 
check is "server=server1, database=db_1, table=table_1". The cache returns all 
2000 privileges including unrelated privileges such like 
"server=server1->database=db_2->action=ALL". 

  2.2) Returning unrelated privileges has two side effects:

    2.2.1) Converting privileges from String to Object overhead is proportional 
to the number of returned privileges from cache. Converting unrelated 
privileges cost time, but no benefit.

    2.2.2) Authorization check goes through each privilege, and its overhead is 
proportional to the number of returned privileges from cache. Converting 
unrelated privileges cost time, but no benefit.

*Solution*:

1) Add a new function listPrivilegeObjects that lets authorization provider get 
privilege objects when checking the authorization. This avoids the conversion 
overhead. All the interfaces from policy engine (PolicyEngine) to the cache 
(PrivilegeCache) have to be changed to add this new function. 

2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
String format to Privilege object at beginning, and directly return the 
privilege objects in listPrivilegeObjects at authorization check. This avoids 
the overhead of conversion at each authorization check. 

3) TreePrivilegeCache organizes the privileges based on the resource hierarchy, 
like a tree. Therefore, it can return only related privileges based on the 
resource to check. This reduces the authorization check overhead. 

  3.1) For example, a user has 2000 privileges assigned, and the resource to 
check is "server=server1, database=db_1, table=table_1". the cache 
TreePrivilegeCache returns only related privileges excluding unrelated 
privileges such like "server=server1->database=db_2->action=ALL". 

  3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
the problem 1). And its implementation SimplePrivilegeCache is not memory 
efficient (the key of the map contains the whole resource hierarchy, and many 
keys share large portion of the same content), nor operational efficient (for 
each authorization check, SimplePrivilegeCache .listPrivileges() has to 
construct a large amount of keys in order to find all related privileges in a 
map). 

4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
this solution is built on top of SENTRY-1291, and utilizes the changes 
SENTRY-1291 made, such as providing resource hierarchy when getting privileges 
for authorization check.

 

*Major Behavior Change*

1) Create a new Interface FilteredPrivilegeCache, which extends from 
PrivilegeCache.

2) Move the function added by SENTRY-1291 in PrivilegeCache to 
FilteredPrivilegeCache. Add additional functions in this solution to 
FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, and 
we are backward compatible with old implementation before SENTRY-1291.

3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from  
SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements  
FilteredPrivilegeCache. 

4) Instead of hard-coding the privilege cache class, use configuration 
AuthzConfVars.AUTHZ_PRIVILEGE_CACHE ("sentry.hive.privilege.cache") to specify 
the privilege cache class name. The default value is 
"org.apache.sentry.provider.cache.TreePrivilegeCache". User can change to 
another cache implementation in sentry-site.xml at a service (such as hive 
server or HMS). The options are

  4.1) org.apache.sentry.provider.cache.SimplePrivilegeCache (the original 
cache implementation before SENTRY-1291)

  4.2) org.apache.sentry.provider.cache.SimpleFilteredPrivilegeCache (the cache 
implemented in SENTRY-1291)

  4.3) org.apache.sentry.provider.cache.TreePrivilegeCache (the cache 
implemented in this Jira SENTRY-2539)

 

 

  was:
*Problem*:

Right now, for a command such "show databases", Sentry has to perform 
authorization checks on each database. When there are many databases, like 
12000 databases in the system, the authorization checks of a single command in 
Sentry could be very slow. There are two main factors that slow down 
authorization checks in Sentry even when caching is enabled:

1) Cache returns the list of privileges in the form of String. As a result, 
every authorization check has to convert the privilege string to privilege 
object.

2) When cache is enabled, the cache returns all privileges of a given user 
regardless what resource to check.

  2.1) for example, a user has 2000 privileges assigned and the resource to 
check is "server=server1, database=db_1, table=table_1". The cache returns all 
2000 privileges including unrelated privileges such like 
"server=server1->database=db_2->action=ALL". 

  2.2) Returning unrelated privileges has two side effects:

    2.2.1) Converting privileges from String to Object overhead is proportional 
to the number of returned privileges from cache. Converting unrelated 
privileges cost time, but no benefit.

    2.2.2) Authorization check goes through each privilege, and its overhead is 
proportional to the number of returned privileges from cache. Converting 
unrelated privileges cost time, but no benefit.

*Solution*:

1) Add a new function listPrivilegeObjects that lets authorization provider get 
privilege objects when checking the authorization. This avoids the conversion 
overhead. All the interfaces from policy engine (PolicyEngine) to the cache 
(PrivilegeCache) have to be changed to add this new function. 

2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
String format to Privilege object at beginning, and directly return the 
privilege objects in listPrivilegeObjects at authorization check. This avoids 
the overhead of conversion at each authorization check. 

3) TreePrivilegeCache organizes the privileges based on the resource hierarchy, 
like a tree. Therefore, it can return only related privileges based on the 
resource to check. This reduces the authorization check overhead. 

  3.1) For example, a user has 2000 privileges assigned, and the resource to 
check is "server=server1, database=db_1, table=table_1". the cache 
TreePrivilegeCache returns only related privileges excluding unrelated 
privileges such like "server=server1->database=db_2->action=ALL". 

  3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
the problem 1). And its implementation SimplePrivilegeCache is not memory 
efficient (the key of the map contains the whole resource hierarchy, and many 
keys share large portion of the same content), nor operational efficient (for 
each authorization check, SimplePrivilegeCache .listPrivileges() has to 
construct a large amount of keys in order to find all related privileges in a 
map). 

4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
this solution is built on top of SENTRY-1291, and utilizes the changes 
SENTRY-1291 made, such as providing resource hierarchy when getting privileges 
for authorization check.


> PolicyEngine  should be able to return privilege directly
> ---------------------------------------------------------
>
>                 Key: SENTRY-2539
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2539
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.1
>            Reporter: Na Li
>            Assignee: Na Li
>            Priority: Major
>         Attachments: SENTRY-2539.002.patch, SENTRY-2539.003.patch, 
> SENTRY-2539.005.patch, SENTRY-2539.006.patch, SENTRY-2539.007.patch, 
> SENTRY-2539.008.patch, SENTRY-2539.008.patch, SENTRY-2539.008.patch, 
> SENTRY-2539.009.patch, SENTRY-2539.010.patch, SENTRY-2539.010.patch
>
>
> *Problem*:
> Right now, for a command such "show databases", Sentry has to perform 
> authorization checks on each database. When there are many databases, like 
> 12000 databases in the system, the authorization checks of a single command 
> in Sentry could be very slow. There are two main factors that slow down 
> authorization checks in Sentry even when caching is enabled:
> 1) Cache returns the list of privileges in the form of String. As a result, 
> every authorization check has to convert the privilege string to privilege 
> object.
> 2) When cache is enabled, the cache returns all privileges of a given user 
> regardless what resource to check.
>   2.1) for example, a user has 2000 privileges assigned and the resource to 
> check is "server=server1, database=db_1, table=table_1". The cache returns 
> all 2000 privileges including unrelated privileges such like 
> "server=server1->database=db_2->action=ALL". 
>   2.2) Returning unrelated privileges has two side effects:
>     2.2.1) Converting privileges from String to Object overhead is 
> proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
>     2.2.2) Authorization check goes through each privilege, and its overhead 
> is proportional to the number of returned privileges from cache. Converting 
> unrelated privileges cost time, but no benefit.
> *Solution*:
> 1) Add a new function listPrivilegeObjects that lets authorization provider 
> get privilege objects when checking the authorization. This avoids the 
> conversion overhead. All the interfaces from policy engine (PolicyEngine) to 
> the cache (PrivilegeCache) have to be changed to add this new function. 
> 2) Implement a new cache TreePrivilegeCache. It converts the privilege from 
> String format to Privilege object at beginning, and directly return the 
> privilege objects in listPrivilegeObjects at authorization check. This avoids 
> the overhead of conversion at each authorization check. 
> 3) TreePrivilegeCache organizes the privileges based on the resource 
> hierarchy, like a tree. Therefore, it can return only related privileges 
> based on the resource to check. This reduces the authorization check 
> overhead. 
>   3.1) For example, a user has 2000 privileges assigned, and the resource to 
> check is "server=server1, database=db_1, table=table_1". the cache 
> TreePrivilegeCache returns only related privileges excluding unrelated 
> privileges such like "server=server1->database=db_2->action=ALL". 
>   3.2) SENTRY-1291 was to address the problem 2). However, it did not address 
> the problem 1). And its implementation SimplePrivilegeCache is not memory 
> efficient (the key of the map contains the whole resource hierarchy, and many 
> keys share large portion of the same content), nor operational efficient (for 
> each authorization check, SimplePrivilegeCache .listPrivileges() has to 
> construct a large amount of keys in order to find all related privileges in a 
> map). 
> 4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, 
> this solution is built on top of SENTRY-1291, and utilizes the changes 
> SENTRY-1291 made, such as providing resource hierarchy when getting 
> privileges for authorization check.
>  
> *Major Behavior Change*
> 1) Create a new Interface FilteredPrivilegeCache, which extends from 
> PrivilegeCache.
> 2) Move the function added by SENTRY-1291 in PrivilegeCache to 
> FilteredPrivilegeCache. Add additional functions in this solution to 
> FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, 
> and we are backward compatible with old implementation before SENTRY-1291.
> 3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from  
> SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements  
> FilteredPrivilegeCache. 
> 4) Instead of hard-coding the privilege cache class, use configuration 
> AuthzConfVars.AUTHZ_PRIVILEGE_CACHE ("sentry.hive.privilege.cache") to 
> specify the privilege cache class name. The default value is 
> "org.apache.sentry.provider.cache.TreePrivilegeCache". User can change to 
> another cache implementation in sentry-site.xml at a service (such as hive 
> server or HMS). The options are
>   4.1) org.apache.sentry.provider.cache.SimplePrivilegeCache (the original 
> cache implementation before SENTRY-1291)
>   4.2) org.apache.sentry.provider.cache.SimpleFilteredPrivilegeCache (the 
> cache implemented in SENTRY-1291)
>   4.3) org.apache.sentry.provider.cache.TreePrivilegeCache (the cache 
> implemented in this Jira SENTRY-2539)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to