Colm, Glad I can help. Do you know what configuration caused the columns not parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2() == false?
Thanks, Lina On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <cohei...@apache.org> wrote: > Hi Lina, > > Thanks a lot for your help on this! I was able to get the test to work by > adding the following config option: > > conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true"); > > Colm. > > On Thu, Jan 4, 2018 at 10:06 PM, Na Li <lina...@cloudera.com> wrote: > > > Colm, > > > > The following code shows where Hive sets the column info. You can debug > > into hive code and see why AccessedColumns is not set. > > > > The related code is in org.apache.hadoop.hive.ql.parse.SemanticAnalyzer > > > > boolean isColumnInfoNeedForAuth = > > SessionState.get().isAuthorizationModeV2() > && HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED); > > if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, > ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { > > ColumnAccessAnalyzer columnAccessAnalyzer = new > ColumnAccessAnalyzer(pCtx); > > this.setColumnAccessInfo(columnAccessAnalyzer. > analyzeColumnAccess(this.getColumnAccessInfo())); > > } > > > > this.LOG.info("Completed plan generation"); > > if (HiveConf.getBoolVar(this.conf, > > ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) > { > > this.putAccessedColumnsToReadEntity(this.inputs, > this.columnAccessInfo); > > } > > > > > > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <lina...@cloudera.com> wrote: > > > >> Colm, > >> > >> I tried to reproduce your issue using sentry 2.0 (master branch) with > >> Hive 2.3.2. > >> > >> The test code is > >> > >> @Test > >> public void testPositiveOnAll() throws Exception { > >> Connection connection = context.createConnection(ADMIN1); > >> Statement statement = context.createStatement(connection); > >> statement.execute("CREATE database " + DB1); > >> statement.execute("use " + DB1); > >> statement.execute("CREATE TABLE t1 (c1 string, c2 string)"); > >> statement.execute("CREATE ROLE user_role1"); > >> statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE user_role1*"); > >> statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1); > >> statement.close(); > >> connection.close(); > >> > >> connection = context.createConnection(USER1_1); > >> statement = context.createStatement(connection); > >> statement.execute("use " + DB1); > >> statement.execute("*SELECT * FROM t1*"); > >> > >> statement.close(); > >> connection.close(); > >> } > >> > >> > >> required privileges: > >> > >> - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select > >> - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select > >> > >> > >> cached privilege: > >> > >> - server=server1->db=db_1->table=t1->action=select > >> > >> So the authorization works. > >> > >> Note > >> > >> - For me, the "*SELECT * FROM t1*" causes the required privileges to > >> contain each column explicitly. However, for you, The "privilege" to > check > >> looks like: > >> Server=server1->Db=authz->Table=words->action=select; The columns > are > >> not explicitly listed. Hive controls if the column is included in > >> required privilege. At org.apache.sentry.binding.h > >> ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings -> > >> getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses > >> accessedColumns from Hive input to add colHierarchy for each column. > >> You can check if accessedColumns is empty or null for the hive > >> version you are using. > >> - For me, the cached privilege does not include column part. For you, > >> the cached privilege is "Server=server1->Db=authz->Table=words-> > >> *Column=**->action=select". *Can you share your test code*, so I can > >> see how you grant the privilege and therefore the cached privilege > contains > >> column? > >> - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE > >> user_role1", and got following error > >> - > >> - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212) > >> [WARN - org.apache.hive.service.cli.th > >> rift.ThriftCLIService.ExecuteStatement( > ThriftCLIService.java:539)] > >> Error executing statement: > >> - org.apache.hive.service.cli.HiveSQLException: Error while > >> compiling statement: FAILED: ParseException line 1:6 cannot > recognize input > >> near 'GRANT' 'SELECT' '(' in ddl statement > >> - at org.apache.hive.service.cli.operation.Operation.toSQLExcepti > >> on(Operation.java:380) > >> - at org.apache.hive.service.cli.operation.SQLOperation.prepare( > >> SQLOperation.java:206) > >> - at org.apache.hive.service.cli.operation.SQLOperation.runIntern > >> al(SQLOperation.java:290) > >> - at org.apache.hive.service.cli.operation.Operation.run(Operatio > >> n.java:320) > >> - at org.apache.hive.service.cli.session.HiveSessionImpl.executeS > >> tatementInternal(HiveSessionImpl.java:530) > >> > >> Thanks, > >> > >> Lina > >> > >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh < > >> cohei...@apache.org> wrote: > >> > >>> Thanks Kalyan! I was thinking that if the cached privilege part does > not > >>> appear in the requested "part", and if is "all", then we should skip > that > >>> part and continue on to the next one. But maybe there is a better > >>> solution. > >>> > >>> Colm. > >>> > >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda < > >>> kkal...@cloudera.com> wrote: > >>> > >>> > Colm, > >>> > > >>> > I will look closer into this today and see If i can help you out. > >>> > > >>> > -Kalyan > >>> > > >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh < > >>> cohei...@apache.org> > >>> > wrote: > >>> > > >>> >> Hi, > >>> >> > >>> >> I've done some further analysis of the problem, and I think it is > not > >>> >> directly related to SENTRY-1291. The problem manifests in > >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege > >>> looks > >>> >> like: > >>> >> > >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select > >>> >> > >>> >> The "privilege" I want to check looks like: > >>> >> > >>> >> Server=server1->Db=authz->Table=words->action=select; > >>> >> > >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It > loops > >>> on > >>> >> the parts of the second privilege, and matches up to > "action=select". > >>> Here > >>> >> it tries to compare to "Column=*" of the cached privilege and fails > on > >>> >> this > >>> >> line: > >>> >> > >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e > >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/ > >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg > >>> e.java#L86 > >>> >> > >>> >> It's clear there's a bug here somewhere, but I'm not sure where - > can > >>> >> someone please advise? > >>> >> > >>> >> Thanks, > >>> >> > >>> >> Colm. > >>> >> > >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <lina...@cloudera.com> > wrote: > >>> >> > >>> >> > Sasha, > >>> >> > > >>> >> > sentry-1291 is helpful for the problem that sentry privilege > checks > >>> >> takes > >>> >> > too long with many explicit grants, which is useful for big > >>> customers. > >>> >> > Another approach that can improve the performance is to organize > the > >>> >> > privileges according to the authorization hierarchy in a tree > >>> >> structure, so > >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() is > in > >>> the > >>> >> > order of log(N), not linear of N, where N is the number of > >>> privileges. > >>> >> > > >>> >> > We can wait for Colm to confirm his issue is caused by > sentry-1291. > >>> If > >>> >> so, > >>> >> > it may be fixed by selecting privileges by finding if the > requesting > >>> >> > authorization object is prefix of cached privileges instead of > exact > >>> >> match. > >>> >> > > >>> >> > in SimplePrivilegeCache > >>> >> > > >>> >> > public Set<String> listPrivileges(Set<String> groups, Set<String> > >>> users, > >>> >> > ActiveRoleSet roleSet, > >>> >> > Authorizable... authorizationHierarchy) { > >>> >> > Set<String> privileges = new HashSet<>(); > >>> >> > Set<StringBuilder> authzKeys = getAuthzKeys(authorizationHier > >>> >> archy); > >>> >> > for (StringBuilder authzKey : authzKeys) { > >>> >> > if (cachedAuthzPrivileges.get(authzKey.toString()) != > null) { > >>> >> > <- > >>> >> > instead of exact matching, add extension function to check if > >>> >> > authzKey.toString is the prefix of the key of the entries > >>> >> > in cachedAuthzPrivileges. > >>> >> > privileges.addAll(cachedAuthzPrivileges.get(authzKey. > >>> >> toString())); > >>> >> > } > >>> >> > } > >>> >> > > >>> >> > return privileges; > >>> >> > } > >>> >> > > >>> >> > Thanks, > >>> >> > > >>> >> > Lina > >>> >> > > >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov < > >>> ak...@cloudera.com > >>> >> > > >>> >> > wrote: > >>> >> > > >>> >> > > I think that SENTRY-1291 should be just reverted - there are > >>> multiple > >>> >> > > issues with it and no one is actually using the fix. Anyone > wants > >>> to > >>> >> do > >>> >> > it? > >>> >> > > > >>> >> > > - Alex > >>> >> > > > >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <lina...@cloudera.com> > >>> wrote: > >>> >> > > > >>> >> > > > Colm, > >>> >> > > > > >>> >> > > > Glad you find the cause! > >>> >> > > > > >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it is > >>> issue > >>> >> at > >>> >> > > > finding cached privileges. > >>> >> > > > > >>> >> > > > Cheers, > >>> >> > > > > >>> >> > > > Lina > >>> >> > > > > >>> >> > > > Sent from my iPhone > >>> >> > > > > >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh < > >>> >> > cohei...@apache.org> > >>> >> > > > wrote: > >>> >> > > > > > >>> >> > > > > Hi, > >>> >> > > > > > >>> >> > > > > I can see what the problem is (that the authorization > >>> hierarchy > >>> >> does > >>> >> > > not > >>> >> > > > > contain the column, and hence doesn't match against the > cached > >>> >> > > > privilege), > >>> >> > > > > but I'm not sure about the best way to solve it. Either the > >>> way we > >>> >> > are > >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in > >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing the > >>> >> cached > >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/ > >>> >> > CommonPrivilege). > >>> >> > > > > > >>> >> > > > > Colm. > >>> >> > > > > > >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li < > lina...@cloudera.com > >>> > > >>> >> > wrote: > >>> >> > > > >> > >>> >> > > > >> Colm, > >>> >> > > > >> > >>> >> > > > >> I did not get chance to look into this issue today. Sorry > >>> about > >>> >> > that. > >>> >> > > > >> > >>> >> > > > >> You can add a e2e test case and set break point at where > the > >>> >> > > > authorization > >>> >> > > > >> object hierarchy to a list of authorization objects, which > is > >>> >> used > >>> >> > to > >>> >> > > do > >>> >> > > > >> exact match with cache > >>> >> > > > >> > >>> >> > > > >> Sent from my iPhone > >>> >> > > > >> > >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh < > >>> >> > > cohei...@apache.org > >>> >> > > > > > >>> >> > > > >> wrote: > >>> >> > > > >>> > >>> >> > > > >>> That would be great, thanks! > >>> >> > > > >>> > >>> >> > > > >>> Colm. > >>> >> > > > >>> > >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li < > >>> lina...@cloudera.com> > >>> >> > > wrote: > >>> >> > > > >>>> > >>> >> > > > >>>> Colm, > >>> >> > > > >>>> > >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look > >>> later > >>> >> > today. > >>> >> > > > >>>> > >>> >> > > > >>>> Thanks, > >>> >> > > > >>>> > >>> >> > > > >>>> Lina > >>> >> > > > >>>> > >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh < > >>> >> > > > >> cohei...@apache.org> > >>> >> > > > >>>> wrote: > >>> >> > > > >>>> > >>> >> > > > >>>>> Hi all, > >>> >> > > > >>>>> > >>> >> > > > >>>>> I've updated some local testcases to work with Sentry > >>> 2.0.0 > >>> >> and > >>> >> > the > >>> >> > > > >> "v1" > >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and > the > >>> "v2" > >>> >> > > > >> binding). > >>> >> > > > >>>>> > >>> >> > > > >>>>> I have a simple table called "words" (word STRING, count > >>> >> INT). I > >>> >> > am > >>> >> > > > >>>> making > >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM words > >>> where > >>> >> > > count > >>> >> > > > == > >>> >> > > > >>>>> '100'". > >>> >> > > > >>>>> > >>> >> > > > >>>>> "bob" is in the "manager" group", which has the > following > >>> >> role: > >>> >> > > > >>>>> > >>> >> > > > >>>>> select_all_role = > >>> >> > > > >>>>> Server=server1->Db=authz->Tabl > >>> e=words->Column=*->action=sele > >>> >> ct > >>> >> > > > >>>>> > >>> >> > > > >>>>> Essentially, authorization is denied even though the > >>> policy is > >>> >> > > > correct. > >>> >> > > > >>>> If > >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached privilege > >>> is: > >>> >> > > > >>>>> > >>> >> > > > >>>>> server=server1->db=authz-> > table=words->column=*=[Server= > >>> >> > > > >>>>> server1->Db=authz->Table=words->Column=*->action= > select] > >>> >> > > > >>>>> > >>> >> > > > >>>>> However, when "listPrivileges" is called, the > authorizable > >>> >> > > hierarchy > >>> >> > > > >>>> looks > >>> >> > > > >>>>> like: > >>> >> > > > >>>>> > >>> >> > > > >>>>> Server [name=server1] > >>> >> > > > >>>>> Database [name=authz] > >>> >> > > > >>>>> Table [name=words] > >>> >> > > > >>>>> > >>> >> > > > >>>>> There is no "column" here, and a match is not made > >>> against the > >>> >> > > cached > >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing > some > >>> >> > > > configuration > >>> >> > > > >>>>> switch? > >>> >> > > > >>>>> > >>> >> > > > >>>>> Colm. > >>> >> > > > >>>>> > >>> >> > > > >>>>> > >>> >> > > > >>>>> -- > >>> >> > > > >>>>> Colm O hEigeartaigh > >>> >> > > > >>>>> > >>> >> > > > >>>>> Talend Community Coder > >>> >> > > > >>>>> http://coders.talend.com > >>> >> > > > >>>>> > >>> >> > > > >>>> > >>> >> > > > >>> > >>> >> > > > >>> > >>> >> > > > >>> > >>> >> > > > >>> -- > >>> >> > > > >>> Colm O hEigeartaigh > >>> >> > > > >>> > >>> >> > > > >>> Talend Community Coder > >>> >> > > > >>> http://coders.talend.com > >>> >> > > > >> > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > -- > >>> >> > > > > Colm O hEigeartaigh > >>> >> > > > > > >>> >> > > > > Talend Community Coder > >>> >> > > > > http://coders.talend.com > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Colm O hEigeartaigh > >>> >> > >>> >> Talend Community Coder > >>> >> http://coders.talend.com > >>> >> > >>> > > >>> > > >>> > >>> > >>> -- > >>> Colm O hEigeartaigh > >>> > >>> Talend Community Coder > >>> http://coders.talend.com > >>> > >> > >> > > > > > -- > Colm O hEigeartaigh > > Talend Community Coder > http://coders.talend.com >