Hi Lina, Thanks a lot for your help on this! I was able to get the test to work by adding the following config option:
conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true"); Colm. On Thu, Jan 4, 2018 at 10:06 PM, Na Li <lina...@cloudera.com> wrote: > Colm, > > The following code shows where Hive sets the column info. You can debug > into hive code and see why AccessedColumns is not set. > > The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer > > boolean isColumnInfoNeedForAuth = > SessionState.get().isAuthorizationModeV2() && HiveConf.getBoolVar(this.conf, > ConfVars.HIVE_AUTHORIZATION_ENABLED); > if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, > ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { > ColumnAccessAnalyzer columnAccessAnalyzer = new > ColumnAccessAnalyzer(pCtx); > > this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumnAccess(this.getColumnAccessInfo())); > } > > this.LOG.info("Completed plan generation"); > if (HiveConf.getBoolVar(this.conf, > ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { > this.putAccessedColumnsToReadEntity(this.inputs, > this.columnAccessInfo); > } > > > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <lina...@cloudera.com> wrote: > >> Colm, >> >> I tried to reproduce your issue using sentry 2.0 (master branch) with >> Hive 2.3.2. >> >> The test code is >> >> @Test >> public void testPositiveOnAll() throws Exception { >> Connection connection = context.createConnection(ADMIN1); >> Statement statement = context.createStatement(connection); >> statement.execute("CREATE database " + DB1); >> statement.execute("use " + DB1); >> statement.execute("CREATE TABLE t1 (c1 string, c2 string)"); >> statement.execute("CREATE ROLE user_role1"); >> statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*"); >> statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1); >> statement.close(); >> connection.close(); >> >> connection = context.createConnection(USER1_1); >> statement = context.createStatement(connection); >> statement.execute("use " + DB1); >> statement.execute("*SELECT * FROM t1*"); >> >> statement.close(); >> connection.close(); >> } >> >> >> required privileges: >> >> - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select >> - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select >> >> >> cached privilege: >> >> - server=server1->db=db_1->table=t1->action=select >> >> So the authorization works. >> >> Note >> >> - For me, the "*SELECT * FROM t1*" causes the required privileges to >> contain each column explicitly. However, for you, The "privilege" to check >> looks like: >> Server=server1->Db=authz->Table=words->action=select; The columns are >> not explicitly listed. Hive controls if the column is included in >> required privilege. At org.apache.sentry.binding.h >> ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings -> >> getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses >> accessedColumns from Hive input to add colHierarchy for each column. >> You can check if accessedColumns is empty or null for the hive >> version you are using. >> - For me, the cached privilege does not include column part. For you, >> the cached privilege is "Server=server1->Db=authz->Table=words-> >> *Column=**->action=select". *Can you share your test code*, so I can >> see how you grant the privilege and therefore the cached privilege >> contains >> column? >> - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE >> user_role1", and got following error >> - >> - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212) >> [WARN - org.apache.hive.service.cli.th >> rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:539)] >> Error executing statement: >> - org.apache.hive.service.cli.HiveSQLException: Error while >> compiling statement: FAILED: ParseException line 1:6 cannot recognize >> input >> near 'GRANT' 'SELECT' '(' in ddl statement >> - at org.apache.hive.service.cli.operation.Operation.toSQLExcepti >> on(Operation.java:380) >> - at org.apache.hive.service.cli.operation.SQLOperation.prepare( >> SQLOperation.java:206) >> - at org.apache.hive.service.cli.operation.SQLOperation.runIntern >> al(SQLOperation.java:290) >> - at org.apache.hive.service.cli.operation.Operation.run(Operatio >> n.java:320) >> - at org.apache.hive.service.cli.session.HiveSessionImpl.executeS >> tatementInternal(HiveSessionImpl.java:530) >> >> Thanks, >> >> Lina >> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh < >> cohei...@apache.org> wrote: >> >>> Thanks Kalyan! I was thinking that if the cached privilege part does not >>> appear in the requested "part", and if is "all", then we should skip that >>> part and continue on to the next one. But maybe there is a better >>> solution. >>> >>> Colm. >>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda < >>> kkal...@cloudera.com> wrote: >>> >>> > Colm, >>> > >>> > I will look closer into this today and see If i can help you out. >>> > >>> > -Kalyan >>> > >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh < >>> cohei...@apache.org> >>> > wrote: >>> > >>> >> Hi, >>> >> >>> >> I've done some further analysis of the problem, and I think it is not >>> >> directly related to SENTRY-1291. The problem manifests in >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege >>> looks >>> >> like: >>> >> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select >>> >> >>> >> The "privilege" I want to check looks like: >>> >> >>> >> Server=server1->Db=authz->Table=words->action=select; >>> >> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It loops >>> on >>> >> the parts of the second privilege, and matches up to "action=select". >>> Here >>> >> it tries to compare to "Column=*" of the cached privilege and fails on >>> >> this >>> >> line: >>> >> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/ >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg >>> e.java#L86 >>> >> >>> >> It's clear there's a bug here somewhere, but I'm not sure where - can >>> >> someone please advise? >>> >> >>> >> Thanks, >>> >> >>> >> Colm. >>> >> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <lina...@cloudera.com> wrote: >>> >> >>> >> > Sasha, >>> >> > >>> >> > sentry-1291 is helpful for the problem that sentry privilege checks >>> >> takes >>> >> > too long with many explicit grants, which is useful for big >>> customers. >>> >> > Another approach that can improve the performance is to organize the >>> >> > privileges according to the authorization hierarchy in a tree >>> >> structure, so >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is in >>> the >>> >> > order of log(N), not linear of N, where N is the number of >>> privileges. >>> >> > >>> >> > We can wait for Colm to confirm his issue is caused by sentry-1291. >>> If >>> >> so, >>> >> > it may be fixed by selecting privileges by finding if the requesting >>> >> > authorization object is prefix of cached privileges instead of exact >>> >> match. >>> >> > >>> >> > in SimplePrivilegeCache >>> >> > >>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String> >>> users, >>> >> > ActiveRoleSet roleSet, >>> >> > Authorizable... authorizationHierarchy) { >>> >> > Set<String> privileges = new HashSet<>(); >>> >> > Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier >>> >> archy); >>> >> > for (StringBuilder authzKey : authzKeys) { >>> >> > if (cachedAuthzPrivileges.get(authzKey.toString()) != null) { >>> >> > <- >>> >> > instead of exact matching, add extension function to check if >>> >> > authzKey.toString is the prefix of the key of the entries >>> >> > in cachedAuthzPrivileges. >>> >> > privileges.addAll(cachedAuthzPrivileges.get(authzKey. >>> >> toString())); >>> >> > } >>> >> > } >>> >> > >>> >> > return privileges; >>> >> > } >>> >> > >>> >> > Thanks, >>> >> > >>> >> > Lina >>> >> > >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov < >>> ak...@cloudera.com >>> >> > >>> >> > wrote: >>> >> > >>> >> > > I think that SENTRY-1291 should be just reverted - there are >>> multiple >>> >> > > issues with it and no one is actually using the fix. Anyone wants >>> to >>> >> do >>> >> > it? >>> >> > > >>> >> > > - Alex >>> >> > > >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <lina...@cloudera.com> >>> wrote: >>> >> > > >>> >> > > > Colm, >>> >> > > > >>> >> > > > Glad you find the cause! >>> >> > > > >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is >>> issue >>> >> at >>> >> > > > finding cached privileges. >>> >> > > > >>> >> > > > Cheers, >>> >> > > > >>> >> > > > Lina >>> >> > > > >>> >> > > > Sent from my iPhone >>> >> > > > >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh < >>> >> > cohei...@apache.org> >>> >> > > > wrote: >>> >> > > > > >>> >> > > > > Hi, >>> >> > > > > >>> >> > > > > I can see what the problem is (that the authorization >>> hierarchy >>> >> does >>> >> > > not >>> >> > > > > contain the column, and hence doesn't match against the cached >>> >> > > > privilege), >>> >> > > > > but I'm not sure about the best way to solve it. Either the >>> way we >>> >> > are >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the >>> >> cached >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/ >>> >> > CommonPrivilege). >>> >> > > > > >>> >> > > > > Colm. >>> >> > > > > >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <lina...@cloudera.com >>> > >>> >> > wrote: >>> >> > > > >> >>> >> > > > >> Colm, >>> >> > > > >> >>> >> > > > >> I did not get chance to look into this issue today. Sorry >>> about >>> >> > that. >>> >> > > > >> >>> >> > > > >> You can add a e2e test case and set break point at where the >>> >> > > > authorization >>> >> > > > >> object hierarchy to a list of authorization objects, which is >>> >> used >>> >> > to >>> >> > > do >>> >> > > > >> exact match with cache >>> >> > > > >> >>> >> > > > >> Sent from my iPhone >>> >> > > > >> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh < >>> >> > > cohei...@apache.org >>> >> > > > > >>> >> > > > >> wrote: >>> >> > > > >>> >>> >> > > > >>> That would be great, thanks! >>> >> > > > >>> >>> >> > > > >>> Colm. >>> >> > > > >>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li < >>> lina...@cloudera.com> >>> >> > > wrote: >>> >> > > > >>>> >>> >> > > > >>>> Colm, >>> >> > > > >>>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look >>> later >>> >> > today. >>> >> > > > >>>> >>> >> > > > >>>> Thanks, >>> >> > > > >>>> >>> >> > > > >>>> Lina >>> >> > > > >>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh < >>> >> > > > >> cohei...@apache.org> >>> >> > > > >>>> wrote: >>> >> > > > >>>> >>> >> > > > >>>>> Hi all, >>> >> > > > >>>>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry >>> 2.0.0 >>> >> and >>> >> > the >>> >> > > > >> "v1" >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and the >>> "v2" >>> >> > > > >> binding). >>> >> > > > >>>>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING, count >>> >> INT). I >>> >> > am >>> >> > > > >>>> making >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words >>> where >>> >> > > count >>> >> > > > == >>> >> > > > >>>>> '100'". >>> >> > > > >>>>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the following >>> >> role: >>> >> > > > >>>>> >>> >> > > > >>>>> select_all_role = >>> >> > > > >>>>> Server=server1->Db=authz->Tabl >>> e=words->Column=*->action=sele >>> >> ct >>> >> > > > >>>>> >>> >> > > > >>>>> Essentially, authorization is denied even though the >>> policy is >>> >> > > > correct. >>> >> > > > >>>> If >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege >>> is: >>> >> > > > >>>>> >>> >> > > > >>>>> server=server1->db=authz->table=words->column=*=[Server= >>> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action=select] >>> >> > > > >>>>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the authorizable >>> >> > > hierarchy >>> >> > > > >>>> looks >>> >> > > > >>>>> like: >>> >> > > > >>>>> >>> >> > > > >>>>> Server [name=server1] >>> >> > > > >>>>> Database [name=authz] >>> >> > > > >>>>> Table [name=words] >>> >> > > > >>>>> >>> >> > > > >>>>> There is no "column" here, and a match is not made >>> against the >>> >> > > cached >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing some >>> >> > > > configuration >>> >> > > > >>>>> switch? >>> >> > > > >>>>> >>> >> > > > >>>>> Colm. >>> >> > > > >>>>> >>> >> > > > >>>>> >>> >> > > > >>>>> -- >>> >> > > > >>>>> Colm O hEigeartaigh >>> >> > > > >>>>> >>> >> > > > >>>>> Talend Community Coder >>> >> > > > >>>>> http://coders.talend.com >>> >> > > > >>>>> >>> >> > > > >>>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> -- >>> >> > > > >>> Colm O hEigeartaigh >>> >> > > > >>> >>> >> > > > >>> Talend Community Coder >>> >> > > > >>> http://coders.talend.com >>> >> > > > >> >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > -- >>> >> > > > > Colm O hEigeartaigh >>> >> > > > > >>> >> > > > > Talend Community Coder >>> >> > > > > http://coders.talend.com >>> >> > > > >>> >> > > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Colm O hEigeartaigh >>> >> >>> >> Talend Community Coder >>> >> http://coders.talend.com >>> >> >>> > >>> > >>> >>> >>> -- >>> Colm O hEigeartaigh >>> >>> Talend Community Coder >>> http://coders.talend.com >>> >> >> > -- Colm O hEigeartaigh Talend Community Coder http://coders.talend.com