[ 
https://issues.apache.org/jira/browse/SENTRY-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated SENTRY-565:
----------------------------
    Attachment: SENTRY-565.002.patch

The patch is rebased, and the following is the result for the performance test 
with HiveMetaStoreClient.getTables(dbName, "*"):
||num of tables||execute time(with patch)||execute time(without patch)||
|1000|85ms|8463ms|
|5000|214ms|42221ms|
|10000|332ms|79462ms|
The test is based on project's e2e test, If hive and sentry are installed in 
different server, I think the execute time will be more than above result. 

> Improvement the performance when Sentry filter the entity
> ---------------------------------------------------------
>
>                 Key: SENTRY-565
>                 URL: https://issues.apache.org/jira/browse/SENTRY-565
>             Project: Sentry
>          Issue Type: Improvement
>            Reporter: Colin Ma
>            Assignee: Colin Ma
>         Attachments: SENTRY-565.001.patch, SENTRY-565.002.patch
>
>
> Currently, when get the metadata from hive, eg, "show tables", "show 
> databases". Sentry will filter the result and output the authorized entities. 
> There will be many RPC calls when filtering the result. The related code is 
> in HiveAuthzBinding, for example, in filterShowTables:
> {code}
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, 
> inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> {code}
> hiveAuthzBinding.authorize will get the privileges from sentry service, if 
> there are many tables in the hive, the filtering process will spend much 
> time. Considering sentry also need to filter the column, HiveAuthzBinding 
> should be improved to reduce the number of rpc calls when doing the filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to