[
https://issues.apache.org/jira/browse/SENTRY-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dapeng Sun reassigned SENTRY-1007:
----------------------------------
Assignee: Dapeng Sun
> Sentry column-level performance for wide tables
> -----------------------------------------------
>
> Key: SENTRY-1007
> URL: https://issues.apache.org/jira/browse/SENTRY-1007
> Project: Sentry
> Issue Type: Bug
> Affects Versions: 1.5.1
> Reporter: Anne Yu
> Assignee: Dapeng Sun
> Priority: Critical
> Attachments: SENTRY-1007.001.patch
>
>
> It appears a query is taking a long time on a wide table due many Sentry
> column-level auth checks.
> here are some investigation results:
> 1) create a table with 4000 columns, grant select on [table|db] to test user,
> however {noformat}select * from table{noformat} still validates column level
> privilege on each column. So this select command issues 4000 queries to
> validate column level permissions. It takes 40 seconds on my test cluster (2x
> large) to return results, for a moment, query seems to freeze:
> {noformat}
> ...
> 2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName ==
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI ==
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) &&
> (URI == "__NULL__") && ((columnName == "test997") || (columnName ==
> "__NULL__")) && (URI == "__NULL__") VARIABLES
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the
> connection used is closing
> 2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName ==
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI ==
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) &&
> (URI == "__NULL__") && ((columnName == "test998") || (columnName ==
> "__NULL__")) && (URI == "__NULL__") VARIABLES
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the
> connection used is closing
> 2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query
> "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege
> WHERE roles.contains(role) && (role.roleName == "test_role") && serverName ==
> "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI ==
> "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) &&
> (URI == "__NULL__") && ((columnName == "test999") || (columnName ==
> "__NULL__")) && (URI == "__NULL__") VARIABLES
> org.apache.sentry.provider.db.service.model.MSentryRole role" since the
> connection used is closing
> {noformat}
> Here is the debug log from sentry service:
> {noformat}
> org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)]
> requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]}
> {noformat}
> 2) the same issue for {noformat}show columns in table{noformat}
> 3) for {noformat}show count(*) in table{noformat}, it requires table level
> privilege, so this issue doesn't exist.
> 4) I found out there are more commands have the same issues are:
> {code}
> SHOW COLUMNS FROM test_tb1;
> create table test_tb2 as select * from test_tb1
> show partitions in test_tb1
> select * from test_tb1
> {code}
> Even for {code}select col1,col2,col3 from test_tb1{code} will issue 3 queries
> for each column, instead of one query for all columns in one table;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)