Colm, I have created SENTRY-2118 to document this setting.
It is strange that without this setting, you have V2 working. From the following code, the column info is not set in ReadEntity if HIVE_STATS_COLLECT_SCANCOLS is false. if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { this.putAccessedColumnsToReadEntity(this.inputs, this.columnAccessInfo); } Thanks, Lina On Fri, Jan 5, 2018 at 10:23 AM, Colm O hEigeartaigh <cohei...@apache.org> wrote: > Hi Lina, > > >> Glad I can help. Do you know what configuration caused the columns not >> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2() >> == false? >> > > Yes exactly - I'm using the V1 binding. > > Colm. > > >> >> Thanks, >> >> Lina >> >> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <cohei...@apache.org> >> wrote: >> >>> Hi Lina, >>> >>> Thanks a lot for your help on this! I was able to get the test to work by >>> adding the following config option: >>> >>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true"); >>> >>> Colm. >>> >>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <lina...@cloudera.com> wrote: >>> >>> > Colm, >>> > >>> > The following code shows where Hive sets the column info. You can debug >>> > into hive code and see why AccessedColumns is not set. >>> > >>> > The related code is in org.apache.hadoop.hive.ql.pars >>> e.SemanticAnalyzer >>> > >>> > boolean isColumnInfoNeedForAuth = >>> SessionState.get().isAuthorizationModeV2() && >>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED); >>> > if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf, >>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { >>> > ColumnAccessAnalyzer columnAccessAnalyzer = new >>> ColumnAccessAnalyzer(pCtx); >>> > this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn >>> Access(this.getColumnAccessInfo())); >>> > } >>> > >>> > this.LOG.info("Completed plan generation"); >>> > if (HiveConf.getBoolVar(this.conf, >>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) { >>> > this.putAccessedColumnsToReadEntity(this.inputs, >>> this.columnAccessInfo); >>> > } >>> > >>> > >>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <lina...@cloudera.com> wrote: >>> > >>> >> Colm, >>> >> >>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with >>> >> Hive 2.3.2. >>> >> >>> >> The test code is >>> >> >>> >> @Test >>> >> public void testPositiveOnAll() throws Exception { >>> >> Connection connection = context.createConnection(ADMIN1); >>> >> Statement statement = context.createStatement(connection); >>> >> statement.execute("CREATE database " + DB1); >>> >> statement.execute("use " + DB1); >>> >> statement.execute("CREATE TABLE t1 (c1 string, c2 string)"); >>> >> statement.execute("CREATE ROLE user_role1"); >>> >> statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE >>> user_role1*"); >>> >> statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1); >>> >> statement.close(); >>> >> connection.close(); >>> >> >>> >> connection = context.createConnection(USER1_1); >>> >> statement = context.createStatement(connection); >>> >> statement.execute("use " + DB1); >>> >> statement.execute("*SELECT * FROM t1*"); >>> >> >>> >> statement.close(); >>> >> connection.close(); >>> >> } >>> >> >>> >> >>> >> required privileges: >>> >> >>> >> - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select >>> >> - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select >>> >> >>> >> >>> >> cached privilege: >>> >> >>> >> - server=server1->db=db_1->table=t1->action=select >>> >> >>> >> So the authorization works. >>> >> >>> >> Note >>> >> >>> >> - For me, the "*SELECT * FROM t1*" causes the required privileges >>> to >>> >> contain each column explicitly. However, for you, The "privilege" >>> to check >>> >> looks like: >>> >> Server=server1->Db=authz->Table=words->action=select; The columns >>> are >>> >> not explicitly listed. Hive controls if the column is included in >>> >> required privilege. At org.apache.sentry.binding.h >>> >> ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings -> >>> >> getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses >>> >> accessedColumns from Hive input to add colHierarchy for each >>> column. >>> >> You can check if accessedColumns is empty or null for the hive >>> >> version you are using. >>> >> - For me, the cached privilege does not include column part. For >>> you, >>> >> the cached privilege is "Server=server1->Db=authz->Table=words-> >>> >> *Column=**->action=select". *Can you share your test code*, so I >>> can >>> >> see how you grant the privilege and therefore the cached privilege >>> contains >>> >> column? >>> >> - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE >>> >> user_role1", and got following error >>> >> - >>> >> - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212) >>> >> [WARN - org.apache.hive.service.cli.th >>> >> rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.jav >>> a:539)] >>> >> Error executing statement: >>> >> - org.apache.hive.service.cli.HiveSQLException: Error while >>> >> compiling statement: FAILED: ParseException line 1:6 cannot >>> recognize input >>> >> near 'GRANT' 'SELECT' '(' in ddl statement >>> >> - at org.apache.hive.service.cli.op >>> eration.Operation.toSQLExcepti >>> >> on(Operation.java:380) >>> >> - at org.apache.hive.service.cli.op >>> eration.SQLOperation.prepare( >>> >> SQLOperation.java:206) >>> >> - at org.apache.hive.service.cli.op >>> eration.SQLOperation.runIntern >>> >> al(SQLOperation.java:290) >>> >> - at org.apache.hive.service.cli.op >>> eration.Operation.run(Operatio >>> >> n.java:320) >>> >> - at org.apache.hive.service.cli.se >>> ssion.HiveSessionImpl.executeS >>> >> tatementInternal(HiveSessionImpl.java:530) >>> >> >>> >> Thanks, >>> >> >>> >> Lina >>> >> >>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh < >>> >> cohei...@apache.org> wrote: >>> >> >>> >>> Thanks Kalyan! I was thinking that if the cached privilege part does >>> not >>> >>> appear in the requested "part", and if is "all", then we should skip >>> that >>> >>> part and continue on to the next one. But maybe there is a better >>> >>> solution. >>> >>> >>> >>> Colm. >>> >>> >>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda < >>> >>> kkal...@cloudera.com> wrote: >>> >>> >>> >>> > Colm, >>> >>> > >>> >>> > I will look closer into this today and see If i can help you out. >>> >>> > >>> >>> > -Kalyan >>> >>> > >>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh < >>> >>> cohei...@apache.org> >>> >>> > wrote: >>> >>> > >>> >>> >> Hi, >>> >>> >> >>> >>> >> I've done some further analysis of the problem, and I think it is >>> not >>> >>> >> directly related to SENTRY-1291. The problem manifests in >>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege >>> >>> looks >>> >>> >> like: >>> >>> >> >>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select >>> >>> >> >>> >>> >> The "privilege" I want to check looks like: >>> >>> >> >>> >>> >> Server=server1->Db=authz->Table=words->action=select; >>> >>> >> >>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It >>> loops >>> >>> on >>> >>> >> the parts of the second privilege, and matches up to >>> "action=select". >>> >>> Here >>> >>> >> it tries to compare to "Column=*" of the cached privilege and >>> fails on >>> >>> >> this >>> >>> >> line: >>> >>> >> >>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e >>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/ >>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg >>> >>> e.java#L86 >>> >>> >> >>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where - >>> can >>> >>> >> someone please advise? >>> >>> >> >>> >>> >> Thanks, >>> >>> >> >>> >>> >> Colm. >>> >>> >> >>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <lina...@cloudera.com> >>> wrote: >>> >>> >> >>> >>> >> > Sasha, >>> >>> >> > >>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege >>> checks >>> >>> >> takes >>> >>> >> > too long with many explicit grants, which is useful for big >>> >>> customers. >>> >>> >> > Another approach that can improve the performance is to >>> organize the >>> >>> >> > privileges according to the authorization hierarchy in a tree >>> >>> >> structure, so >>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess() >>> is in >>> >>> the >>> >>> >> > order of log(N), not linear of N, where N is the number of >>> >>> privileges. >>> >>> >> > >>> >>> >> > We can wait for Colm to confirm his issue is caused by >>> sentry-1291. >>> >>> If >>> >>> >> so, >>> >>> >> > it may be fixed by selecting privileges by finding if the >>> requesting >>> >>> >> > authorization object is prefix of cached privileges instead of >>> exact >>> >>> >> match. >>> >>> >> > >>> >>> >> > in SimplePrivilegeCache >>> >>> >> > >>> >>> >> > public Set<String> listPrivileges(Set<String> groups, >>> Set<String> >>> >>> users, >>> >>> >> > ActiveRoleSet roleSet, >>> >>> >> > Authorizable... authorizationHierarchy) { >>> >>> >> > Set<String> privileges = new HashSet<>(); >>> >>> >> > Set<StringBuilder> authzKeys = >>> getAuthzKeys(authorizationHier >>> >>> >> archy); >>> >>> >> > for (StringBuilder authzKey : authzKeys) { >>> >>> >> > if (cachedAuthzPrivileges.get(authzKey.toString()) != >>> null) { >>> >>> >> > <- >>> >>> >> > instead of exact matching, add extension function to check if >>> >>> >> > authzKey.toString is the prefix of the key of the entries >>> >>> >> > in cachedAuthzPrivileges. >>> >>> >> > privileges.addAll(cachedAuthzPrivileges.get(authzKey. >>> >>> >> toString())); >>> >>> >> > } >>> >>> >> > } >>> >>> >> > >>> >>> >> > return privileges; >>> >>> >> > } >>> >>> >> > >>> >>> >> > Thanks, >>> >>> >> > >>> >>> >> > Lina >>> >>> >> > >>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov < >>> >>> ak...@cloudera.com >>> >>> >> > >>> >>> >> > wrote: >>> >>> >> > >>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are >>> >>> multiple >>> >>> >> > > issues with it and no one is actually using the fix. Anyone >>> wants >>> >>> to >>> >>> >> do >>> >>> >> > it? >>> >>> >> > > >>> >>> >> > > - Alex >>> >>> >> > > >>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <lina...@cloudera.com> >>> >>> wrote: >>> >>> >> > > >>> >>> >> > > > Colm, >>> >>> >> > > > >>> >>> >> > > > Glad you find the cause! >>> >>> >> > > > >>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it >>> is >>> >>> issue >>> >>> >> at >>> >>> >> > > > finding cached privileges. >>> >>> >> > > > >>> >>> >> > > > Cheers, >>> >>> >> > > > >>> >>> >> > > > Lina >>> >>> >> > > > >>> >>> >> > > > Sent from my iPhone >>> >>> >> > > > >>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh < >>> >>> >> > cohei...@apache.org> >>> >>> >> > > > wrote: >>> >>> >> > > > > >>> >>> >> > > > > Hi, >>> >>> >> > > > > >>> >>> >> > > > > I can see what the problem is (that the authorization >>> >>> hierarchy >>> >>> >> does >>> >>> >> > > not >>> >>> >> > > > > contain the column, and hence doesn't match against the >>> cached >>> >>> >> > > > privilege), >>> >>> >> > > > > but I'm not sure about the best way to solve it. Either >>> the >>> >>> way we >>> >>> >> > are >>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in >>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing >>> the >>> >>> >> cached >>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/ >>> >>> >> > CommonPrivilege). >>> >>> >> > > > > >>> >>> >> > > > > Colm. >>> >>> >> > > > > >>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li < >>> lina...@cloudera.com >>> >>> > >>> >>> >> > wrote: >>> >>> >> > > > >> >>> >>> >> > > > >> Colm, >>> >>> >> > > > >> >>> >>> >> > > > >> I did not get chance to look into this issue today. Sorry >>> >>> about >>> >>> >> > that. >>> >>> >> > > > >> >>> >>> >> > > > >> You can add a e2e test case and set break point at where >>> the >>> >>> >> > > > authorization >>> >>> >> > > > >> object hierarchy to a list of authorization objects, >>> which is >>> >>> >> used >>> >>> >> > to >>> >>> >> > > do >>> >>> >> > > > >> exact match with cache >>> >>> >> > > > >> >>> >>> >> > > > >> Sent from my iPhone >>> >>> >> > > > >> >>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh < >>> >>> >> > > cohei...@apache.org >>> >>> >> > > > > >>> >>> >> > > > >> wrote: >>> >>> >> > > > >>> >>> >>> >> > > > >>> That would be great, thanks! >>> >>> >> > > > >>> >>> >>> >> > > > >>> Colm. >>> >>> >> > > > >>> >>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li < >>> >>> lina...@cloudera.com> >>> >>> >> > > wrote: >>> >>> >> > > > >>>> >>> >>> >> > > > >>>> Colm, >>> >>> >> > > > >>>> >>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look >>> >>> later >>> >>> >> > today. >>> >>> >> > > > >>>> >>> >>> >> > > > >>>> Thanks, >>> >>> >> > > > >>>> >>> >>> >> > > > >>>> Lina >>> >>> >> > > > >>>> >>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh < >>> >>> >> > > > >> cohei...@apache.org> >>> >>> >> > > > >>>> wrote: >>> >>> >> > > > >>>> >>> >>> >> > > > >>>>> Hi all, >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry >>> >>> 2.0.0 >>> >>> >> and >>> >>> >> > the >>> >>> >> > > > >> "v1" >>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and >>> the >>> >>> "v2" >>> >>> >> > > > >> binding). >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING, >>> count >>> >>> >> INT). I >>> >>> >> > am >>> >>> >> > > > >>>> making >>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM >>> words >>> >>> where >>> >>> >> > > count >>> >>> >> > > > == >>> >>> >> > > > >>>>> '100'". >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the >>> following >>> >>> >> role: >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> select_all_role = >>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl >>> >>> e=words->Column=*->action=sele >>> >>> >> ct >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> Essentially, authorization is denied even though the >>> >>> policy is >>> >>> >> > > > correct. >>> >>> >> > > > >>>> If >>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached >>> privilege >>> >>> is: >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> server=server1->db=authz->tabl >>> e=words->column=*=[Server= >>> >>> >> > > > >>>>> server1->Db=authz->Table=words >>> ->Column=*->action=select] >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the >>> authorizable >>> >>> >> > > hierarchy >>> >>> >> > > > >>>> looks >>> >>> >> > > > >>>>> like: >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> Server [name=server1] >>> >>> >> > > > >>>>> Database [name=authz] >>> >>> >> > > > >>>>> Table [name=words] >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> There is no "column" here, and a match is not made >>> >>> against the >>> >>> >> > > cached >>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing >>> some >>> >>> >> > > > configuration >>> >>> >> > > > >>>>> switch? >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> Colm. >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> -- >>> >>> >> > > > >>>>> Colm O hEigeartaigh >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>>> Talend Community Coder >>> >>> >> > > > >>>>> http://coders.talend.com >>> >>> >> > > > >>>>> >>> >>> >> > > > >>>> >>> >>> >> > > > >>> >>> >>> >> > > > >>> >>> >>> >> > > > >>> >>> >>> >> > > > >>> -- >>> >>> >> > > > >>> Colm O hEigeartaigh >>> >>> >> > > > >>> >>> >>> >> > > > >>> Talend Community Coder >>> >>> >> > > > >>> http://coders.talend.com >>> >>> >> > > > >> >>> >>> >> > > > > >>> >>> >> > > > > >>> >>> >> > > > > >>> >>> >> > > > > -- >>> >>> >> > > > > Colm O hEigeartaigh >>> >>> >> > > > > >>> >>> >> > > > > Talend Community Coder >>> >>> >> > > > > http://coders.talend.com >>> >>> >> > > > >>> >>> >> > > >>> >>> >> > >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> -- >>> >>> >> Colm O hEigeartaigh >>> >>> >> >>> >>> >> Talend Community Coder >>> >>> >> http://coders.talend.com >>> >>> >> >>> >>> > >>> >>> > >>> >>> >>> >>> >>> >>> -- >>> >>> Colm O hEigeartaigh >>> >>> >>> >>> Talend Community Coder >>> >>> http://coders.talend.com >>> >>> >>> >> >>> >> >>> > >>> >>> >>> -- >>> Colm O hEigeartaigh >>> >>> Talend Community Coder >>> http://coders.talend.com >>> >> >> > > > -- > Colm O hEigeartaigh > > Talend Community Coder > http://coders.talend.com >