Colm,

I have created SENTRY-2118 to document this setting.

It is strange that without this setting, you have V2 working. From the
following code, the column info is not set in ReadEntity if
HIVE_STATS_COLLECT_SCANCOLS is false.

if (HiveConf.getBoolVar(this.conf, ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
          this.putAccessedColumnsToReadEntity(this.inputs,
this.columnAccessInfo);
        }


Thanks,

Lina

On Fri, Jan 5, 2018 at 10:23 AM, Colm O hEigeartaigh <cohei...@apache.org>
wrote:

> Hi Lina,
>
>
>> Glad I can help. Do you know what configuration caused the columns not
>> parsed by Hive? If it is due to SessionState.get().isAuthorizationModeV2()
>> == false?
>>
>
> Yes exactly - I'm using the V1 binding.
>
> Colm.
>
>
>>
>> Thanks,
>>
>> Lina
>>
>> On Fri, Jan 5, 2018 at 6:12 AM, Colm O hEigeartaigh <cohei...@apache.org>
>> wrote:
>>
>>> Hi Lina,
>>>
>>> Thanks a lot for your help on this! I was able to get the test to work by
>>> adding the following config option:
>>>
>>> conf.set(HiveConf.ConfVars.HIVE_STATS_COLLECT_SCANCOLS.varname, "true");
>>>
>>> Colm.
>>>
>>> On Thu, Jan 4, 2018 at 10:06 PM, Na Li <lina...@cloudera.com> wrote:
>>>
>>> > Colm,
>>> >
>>> > The following code shows where Hive sets the column info. You can debug
>>> > into hive code and see why AccessedColumns is not set.
>>> >
>>> > The related code is in org.apache.hadoop.hive.ql.pars
>>> e.SemanticAnalyzer
>>> >
>>> >               boolean isColumnInfoNeedForAuth =
>>> SessionState.get().isAuthorizationModeV2() &&
>>> HiveConf.getBoolVar(this.conf, ConfVars.HIVE_AUTHORIZATION_ENABLED);
>>> >         if (isColumnInfoNeedForAuth || HiveConf.getBoolVar(this.conf,
>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>> >           ColumnAccessAnalyzer columnAccessAnalyzer = new
>>> ColumnAccessAnalyzer(pCtx);
>>> >           this.setColumnAccessInfo(columnAccessAnalyzer.analyzeColumn
>>> Access(this.getColumnAccessInfo()));
>>> >         }
>>> >
>>> >           this.LOG.info("Completed plan generation");
>>> >         if (HiveConf.getBoolVar(this.conf,
>>> ConfVars.HIVE_STATS_COLLECT_SCANCOLS)) {
>>> >           this.putAccessedColumnsToReadEntity(this.inputs,
>>> this.columnAccessInfo);
>>> >         }
>>> >
>>> >
>>> > On Wed, Jan 3, 2018 at 11:28 PM, Na Li <lina...@cloudera.com> wrote:
>>> >
>>> >> Colm,
>>> >>
>>> >> I tried to reproduce your issue using sentry 2.0 (master branch) with
>>> >> Hive 2.3.2.
>>> >>
>>> >> The test code is
>>> >>
>>> >>   @Test
>>> >>   public void testPositiveOnAll() throws Exception {
>>> >>     Connection connection = context.createConnection(ADMIN1);
>>> >>     Statement statement = context.createStatement(connection);
>>> >>     statement.execute("CREATE database " + DB1);
>>> >>     statement.execute("use " + DB1);
>>> >>     statement.execute("CREATE TABLE t1 (c1 string, c2 string)");
>>> >>     statement.execute("CREATE ROLE user_role1");
>>> >>     statement.execute("*GRANT SELECT ON TABLE t1 TO ROLE
>>> user_role1*");
>>> >>     statement.execute("GRANT ROLE user_role1 TO GROUP " + USERGROUP1);
>>> >>     statement.close();
>>> >>     connection.close();
>>> >>
>>> >>     connection = context.createConnection(USER1_1);
>>> >>     statement = context.createStatement(connection);
>>> >>     statement.execute("use " + DB1);
>>> >>     statement.execute("*SELECT * FROM t1*");
>>> >>
>>> >>     statement.close();
>>> >>     connection.close();
>>> >>   }
>>> >>
>>> >>
>>> >> required privileges:
>>> >>
>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c1*->action=select
>>> >>    - Server=server1->Db=db_1->Table=t1->*Column=c2*->action=select
>>> >>
>>> >>
>>> >> cached privilege:
>>> >>
>>> >>    - server=server1->db=db_1->table=t1->action=select
>>> >>
>>> >> So the authorization works.
>>> >>
>>> >> Note
>>> >>
>>> >>    - For me, the "*SELECT * FROM t1*" causes the required privileges
>>> to
>>> >>    contain each column explicitly. However, for you, The "privilege"
>>> to check
>>> >>    looks like:
>>> >>    Server=server1->Db=authz->Table=words->action=select; The columns
>>> are
>>> >>    not explicitly listed. Hive controls if the column is included in
>>> >>    required privilege. At org.apache.sentry.binding.h
>>> >>    ive.authz.HiveAuthzBindingHookBase.authorizeWithHiveBindings ->
>>> >>    getInputHierarchyFromInputs -> addColumnHierarchy, Sentry uses
>>> >>    accessedColumns from Hive input to add colHierarchy for each
>>> column.
>>> >>    You can check if accessedColumns is empty or null for the hive
>>> >>    version you are using.
>>> >>    - For me, the cached privilege does not include column part. For
>>> you,
>>> >>    the cached privilege is "Server=server1->Db=authz->Table=words->
>>> >>    *Column=**->action=select". *Can you share your test code*, so I
>>> can
>>> >>    see how you grant the privilege and therefore the cached privilege
>>> contains
>>> >>    column?
>>> >>       - I tried to use "GRANT *SELECT(*)* ON TABLE t1 TO ROLE
>>> >>       user_role1", and got following error
>>> >>       -
>>> >>       - 2018-01-03 23:23:50,459 (HiveServer2-Handler-Pool: Thread-212)
>>> >>       [WARN - org.apache.hive.service.cli.th
>>> >>       rift.ThriftCLIService.ExecuteStatement(ThriftCLIService.jav
>>> a:539)]
>>> >>       Error executing statement:
>>> >>       - org.apache.hive.service.cli.HiveSQLException: Error while
>>> >>       compiling statement: FAILED: ParseException line 1:6 cannot
>>> recognize input
>>> >>       near 'GRANT' 'SELECT' '(' in ddl statement
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.Operation.toSQLExcepti
>>> >>       on(Operation.java:380)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.SQLOperation.prepare(
>>> >>       SQLOperation.java:206)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.SQLOperation.runIntern
>>> >>       al(SQLOperation.java:290)
>>> >>       - at org.apache.hive.service.cli.op
>>> eration.Operation.run(Operatio
>>> >>       n.java:320)
>>> >>       - at org.apache.hive.service.cli.se
>>> ssion.HiveSessionImpl.executeS
>>> >>       tatementInternal(HiveSessionImpl.java:530)
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Lina
>>> >>
>>> >> On Mon, Dec 18, 2017 at 10:14 AM, Colm O hEigeartaigh <
>>> >> cohei...@apache.org> wrote:
>>> >>
>>> >>> Thanks Kalyan! I was thinking that if the cached privilege part does
>>> not
>>> >>> appear in the requested "part", and if is "all", then we should skip
>>> that
>>> >>> part and continue on to the next one. But maybe there is a better
>>> >>> solution.
>>> >>>
>>> >>> Colm.
>>> >>>
>>> >>> On Mon, Dec 18, 2017 at 4:06 PM, Kalyan Kumar Kalvagadda <
>>> >>> kkal...@cloudera.com> wrote:
>>> >>>
>>> >>> > Colm,
>>> >>> >
>>> >>> > I will look closer into this today and see If i can help you out.
>>> >>> >
>>> >>> > -Kalyan
>>> >>> >
>>> >>> > On Mon, Dec 18, 2017 at 4:52 AM, Colm O hEigeartaigh <
>>> >>> cohei...@apache.org>
>>> >>> > wrote:
>>> >>> >
>>> >>> >> Hi,
>>> >>> >>
>>> >>> >> I've done some further analysis of the problem, and I think it is
>>> not
>>> >>> >> directly related to SENTRY-1291. The problem manifests in
>>> >>> >> CommonPrivilege.implies(privilege, model). My (cached) privilege
>>> >>> looks
>>> >>> >> like:
>>> >>> >>
>>> >>> >> Server=server1->Db=authz->Table=words->Column=*->action=select
>>> >>> >>
>>> >>> >> The "privilege" I want to check looks like:
>>> >>> >>
>>> >>> >> Server=server1->Db=authz->Table=words->action=select;
>>> >>> >>
>>> >>> >> The problem is in the "for" loop in CommonPrivilege.implies. It
>>> loops
>>> >>> on
>>> >>> >> the parts of the second privilege, and matches up to
>>> "action=select".
>>> >>> Here
>>> >>> >> it tries to compare to "Column=*" of the cached privilege and
>>> fails on
>>> >>> >> this
>>> >>> >> line:
>>> >>> >>
>>> >>> >> https://github.com/apache/sentry/blob/a4924edc79b26f937e3e5e
>>> >>> >> a3584f0b4307dd4135/sentry-policy/sentry-policy-common/
>>> >>> >> src/main/java/org/apache/sentry/policy/common/CommonPrivileg
>>> >>> e.java#L86
>>> >>> >>
>>> >>> >> It's clear there's a bug here somewhere, but I'm not sure where -
>>> can
>>> >>> >> someone please advise?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >>
>>> >>> >> Colm.
>>> >>> >>
>>> >>> >> On Wed, Dec 13, 2017 at 8:28 PM, Na Li <lina...@cloudera.com>
>>> wrote:
>>> >>> >>
>>> >>> >> > Sasha,
>>> >>> >> >
>>> >>> >> > sentry-1291 is helpful for the problem that sentry privilege
>>> checks
>>> >>> >> takes
>>> >>> >> > too long with many explicit grants, which is useful for big
>>> >>> customers.
>>> >>> >> > Another approach that can improve the performance is to
>>> organize the
>>> >>> >> > privileges according to the authorization hierarchy in a tree
>>> >>> >> structure, so
>>> >>> >> > finding match in ResourceAuthorizationProvider.doHasAccess()
>>> is in
>>> >>> the
>>> >>> >> > order of log(N), not linear of N, where N is the number of
>>> >>> privileges.
>>> >>> >> >
>>> >>> >> > We can wait for Colm to confirm his issue is caused by
>>> sentry-1291.
>>> >>> If
>>> >>> >> so,
>>> >>> >> > it may be fixed by selecting privileges by finding if the
>>> requesting
>>> >>> >> > authorization object is prefix of cached privileges instead of
>>> exact
>>> >>> >> match.
>>> >>> >> >
>>> >>> >> > in SimplePrivilegeCache
>>> >>> >> >
>>> >>> >> > public Set<String> listPrivileges(Set<String> groups,
>>> Set<String>
>>> >>> users,
>>> >>> >> > ActiveRoleSet roleSet,
>>> >>> >> >       Authorizable... authorizationHierarchy) {
>>> >>> >> >     Set<String> privileges = new HashSet<>();
>>> >>> >> >     Set<StringBuilder> authzKeys =
>>> getAuthzKeys(authorizationHier
>>> >>> >> archy);
>>> >>> >> >     for (StringBuilder authzKey : authzKeys) {
>>> >>> >> >       if (cachedAuthzPrivileges.get(authzKey.toString()) !=
>>> null) {
>>> >>> >> >   <-
>>> >>> >> > instead of exact matching, add extension function to check if
>>> >>> >> > authzKey.toString is the prefix of the key of the entries
>>> >>> >> > in cachedAuthzPrivileges.
>>> >>> >> >         privileges.addAll(cachedAuthzPrivileges.get(authzKey.
>>> >>> >> toString()));
>>> >>> >> >       }
>>> >>> >> >     }
>>> >>> >> >
>>> >>> >> >     return privileges;
>>> >>> >> >   }
>>> >>> >> >
>>> >>> >> > Thanks,
>>> >>> >> >
>>> >>> >> > Lina
>>> >>> >> >
>>> >>> >> > On Wed, Dec 13, 2017 at 1:08 PM, Alexander Kolbasov <
>>> >>> ak...@cloudera.com
>>> >>> >> >
>>> >>> >> > wrote:
>>> >>> >> >
>>> >>> >> > > I think that SENTRY-1291 should be just reverted - there are
>>> >>> multiple
>>> >>> >> > > issues with it and no one is actually using the fix. Anyone
>>> wants
>>> >>> to
>>> >>> >> do
>>> >>> >> > it?
>>> >>> >> > >
>>> >>> >> > > - Alex
>>> >>> >> > >
>>> >>> >> > > On Wed, Dec 13, 2017 at 4:44 AM, Na Li <lina...@cloudera.com>
>>> >>> wrote:
>>> >>> >> > >
>>> >>> >> > > > Colm,
>>> >>> >> > > >
>>> >>> >> > > > Glad you find the cause!
>>> >>> >> > > >
>>> >>> >> > > > You can revert Sentry-1291, and see if it works. If so, it
>>> is
>>> >>> issue
>>> >>> >> at
>>> >>> >> > > > finding cached privileges.
>>> >>> >> > > >
>>> >>> >> > > > Cheers,
>>> >>> >> > > >
>>> >>> >> > > > Lina
>>> >>> >> > > >
>>> >>> >> > > > Sent from my iPhone
>>> >>> >> > > >
>>> >>> >> > > > > On Dec 13, 2017, at 4:58 AM, Colm O hEigeartaigh <
>>> >>> >> > cohei...@apache.org>
>>> >>> >> > > > wrote:
>>> >>> >> > > > >
>>> >>> >> > > > > Hi,
>>> >>> >> > > > >
>>> >>> >> > > > > I can see what the problem is (that the authorization
>>> >>> hierarchy
>>> >>> >> does
>>> >>> >> > > not
>>> >>> >> > > > > contain the column, and hence doesn't match against the
>>> cached
>>> >>> >> > > > privilege),
>>> >>> >> > > > > but I'm not sure about the best way to solve it. Either
>>> the
>>> >>> way we
>>> >>> >> > are
>>> >>> >> > > > > creating the authorization hierarchy is incorrect (e.g. in
>>> >>> >> > > > > HiveAuthzBindingHookBase) or else the way we are parsing
>>> the
>>> >>> >> cached
>>> >>> >> > > > > privilege is incorrect (e.g. in SimplePrivilegeCache/
>>> >>> >> > CommonPrivilege).
>>> >>> >> > > > >
>>> >>> >> > > > > Colm.
>>> >>> >> > > > >
>>> >>> >> > > > >> On Wed, Dec 13, 2017 at 5:57 AM, Na Li <
>>> lina...@cloudera.com
>>> >>> >
>>> >>> >> > wrote:
>>> >>> >> > > > >>
>>> >>> >> > > > >> Colm,
>>> >>> >> > > > >>
>>> >>> >> > > > >> I did not get chance to look into this issue today. Sorry
>>> >>> about
>>> >>> >> > that.
>>> >>> >> > > > >>
>>> >>> >> > > > >> You can add a e2e test case and set break point at where
>>> the
>>> >>> >> > > > authorization
>>> >>> >> > > > >> object hierarchy to a list of authorization objects,
>>> which is
>>> >>> >> used
>>> >>> >> > to
>>> >>> >> > > do
>>> >>> >> > > > >> exact match with cache
>>> >>> >> > > > >>
>>> >>> >> > > > >> Sent from my iPhone
>>> >>> >> > > > >>
>>> >>> >> > > > >>> On Dec 12, 2017, at 11:27 AM, Colm O hEigeartaigh <
>>> >>> >> > > cohei...@apache.org
>>> >>> >> > > > >
>>> >>> >> > > > >> wrote:
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> That would be great, thanks!
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> Colm.
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:36 PM, Na Li <
>>> >>> lina...@cloudera.com>
>>> >>> >> > > wrote:
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Colm,
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> I suspect it is a bug in SENTRY-1291. I can take a look
>>> >>> later
>>> >>> >> > today.
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Thanks,
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> Lina
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>> On Tue, Dec 12, 2017 at 4:32 AM, Colm O hEigeartaigh <
>>> >>> >> > > > >> cohei...@apache.org>
>>> >>> >> > > > >>>> wrote:
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>>> Hi all,
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> I've updated some local testcases to work with Sentry
>>> >>> 2.0.0
>>> >>> >> and
>>> >>> >> > the
>>> >>> >> > > > >> "v1"
>>> >>> >> > > > >>>>> Hive binding (previously working fine using 1.8.0 and
>>> the
>>> >>> "v2"
>>> >>> >> > > > >> binding).
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> I have a simple table called "words" (word STRING,
>>> count
>>> >>> >> INT). I
>>> >>> >> > am
>>> >>> >> > > > >>>> making
>>> >>> >> > > > >>>>> an SQL call as the user "bob", e.g. "SELECT * FROM
>>> words
>>> >>> where
>>> >>> >> > > count
>>> >>> >> > > > ==
>>> >>> >> > > > >>>>> '100'".
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> "bob" is in the "manager" group", which has the
>>> following
>>> >>> >> role:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> select_all_role =
>>> >>> >> > > > >>>>> Server=server1->Db=authz->Tabl
>>> >>> e=words->Column=*->action=sele
>>> >>> >> ct
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Essentially, authorization is denied even though the
>>> >>> policy is
>>> >>> >> > > > correct.
>>> >>> >> > > > >>>> If
>>> >>> >> > > > >>>>> I look at the SimplePrivilegeCache, the cached
>>> privilege
>>> >>> is:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> server=server1->db=authz->tabl
>>> e=words->column=*=[Server=
>>> >>> >> > > > >>>>> server1->Db=authz->Table=words
>>> ->Column=*->action=select]
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> However, when "listPrivileges" is called, the
>>> authorizable
>>> >>> >> > > hierarchy
>>> >>> >> > > > >>>> looks
>>> >>> >> > > > >>>>> like:
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Server [name=server1]
>>> >>> >> > > > >>>>> Database [name=authz]
>>> >>> >> > > > >>>>> Table [name=words]
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> There is no "column" here, and a match is not made
>>> >>> against the
>>> >>> >> > > cached
>>> >>> >> > > > >>>>> privilege as a result. Is this a bug or am I missing
>>> some
>>> >>> >> > > > configuration
>>> >>> >> > > > >>>>> switch?
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Colm.
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> --
>>> >>> >> > > > >>>>> Colm O hEigeartaigh
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>> Talend Community Coder
>>> >>> >> > > > >>>>> http://coders.talend.com
>>> >>> >> > > > >>>>>
>>> >>> >> > > > >>>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> --
>>> >>> >> > > > >>> Colm O hEigeartaigh
>>> >>> >> > > > >>>
>>> >>> >> > > > >>> Talend Community Coder
>>> >>> >> > > > >>> http://coders.talend.com
>>> >>> >> > > > >>
>>> >>> >> > > > >
>>> >>> >> > > > >
>>> >>> >> > > > >
>>> >>> >> > > > > --
>>> >>> >> > > > > Colm O hEigeartaigh
>>> >>> >> > > > >
>>> >>> >> > > > > Talend Community Coder
>>> >>> >> > > > > http://coders.talend.com
>>> >>> >> > > >
>>> >>> >> > >
>>> >>> >> >
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Colm O hEigeartaigh
>>> >>> >>
>>> >>> >> Talend Community Coder
>>> >>> >> http://coders.talend.com
>>> >>> >>
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Colm O hEigeartaigh
>>> >>>
>>> >>> Talend Community Coder
>>> >>> http://coders.talend.com
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>> --
>>> Colm O hEigeartaigh
>>>
>>> Talend Community Coder
>>> http://coders.talend.com
>>>
>>
>>
>
>
> --
> Colm O hEigeartaigh
>
> Talend Community Coder
> http://coders.talend.com
>

Reply via email to