Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12632 )

Change subject: IMPALA-8100: Add initial support for Ranger
......................................................................


Patch Set 9:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12632/9//COMMIT_MSG@30
PS9, Line 30: - Ran all FE tests
            : - Ran all E2E authorization tests
did you run these with ranger configured? I'm surprised they would pass since 
best I can tell this patch doesn't attempt to support GRANT/REVOKE statements 
etc, right?


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@41
PS9, Line 41:     
"org.apache.impala.authorization.sentry.SentryAuthorizationFactory",
I wonder if we should switch this to be a more human-friendly flag instead of a 
class name, before we ship it?


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@46
PS9, Line 46:     "identify the application that communicates with Ranger.");
Worth noting whether this is required, considering it has an empty default


http://gerrit.cloudera.org:8080/#/c/12632/9/be/src/service/frontend.cc@47
PS9, Line 47: DEFINE_string(server_name, "", "The name to use for securing this 
impalad "
we probably can't rename the existing sentry-specific flags (for compat 
reasons) but perhaps we can amend the help text for each of these flags to say 
something like "only relevant if sentry is enabled" or something?

It might be worth hard-coding some checks to make sure 
GetCommandLineFlagInfoOrDie("server_name").is_default for these flags if ranger 
is set, and vice-versa for ranger-specific flags, so that people don't 
accidentally cross-pollinate their configs


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml
File fe/pom.xml:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/pom.xml@112
PS9, Line 112:       <exclusions>
it seems that ranger-plugins-common has a pom dependency on 
mysql-connector-java. Is that really necessary? I thought ranger used a 
"service" model rather than the plugin directly talking to any database?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java@126
PS9, Line 126:         Privilege.ANY);
I'm not 100% sure about this part of the change. If I don't have the 
'VIEW_METADATA' privilege but I do have permissions on one or more columns, 
should I still be able to use 'DESCRIBE' in all cases? For example, 'DESCRIBE 
EXTENDED' on a view will show the view text and other metadata, but maybe I'm 
not supposed to have that capability if I don't have VIEW_METADATA privileges?

Do we have some good documentation on the intended semantics here?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@77
PS9, Line 77:         // Any column access is special in Ranger. For example if 
we want to check if
            :         // we have access to any column on a particular table, we 
need to build a resource
            :         // without the column resource.
            :         // For example:
            :         // access type: RangerPolicyEngine.ANY_ACCESS
            :         // resources: [server=server1, database=foo, table=bar]
I'm not understanding this. When you are saying "Any column access" do you mean 
"authorization checks for wildcard columns" or something? I think expanding 
upon the example would be useful.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@106
PS9, Line 106:     if (request.getPrivilege() == Privilege.VIEW_METADATA) {
does this need to be special cased? It seems all the other privileges already 
have an "implied" set which is equal to just the privilege itself. (ANY 
probably does need to be special cased still)


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@117
PS9, Line 117:       if (authorize(plugin, resource, user, privilege)) {
does this result in multiple audit events, one per implied privilege?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java@131
PS9, Line 131: new HashSet
can you use Collections.emptySet?

Also is this a TODO? Should we be doing group resolution here? Or will Ranger 
fill in its own groups?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationConfig.java@38
PS9, Line 38: serviceType
nit: Preconditions.checkNotNull(foo) returns 'foo', so you can combine these 
assignments with the checks


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java@55
PS9, Line 55:     
Preconditions.checkArgument(!Strings.isNullOrEmpty(backendConfig.getRangerAppId()));
is this the only validation that happens on this config? If so, we should make 
sure the exception has a clear message pointing back to the flag so the user 
doesn't need to go pull up the source to track down which config is throwing 
IllegalArgumentException


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java
File 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@37
PS9, Line 37: serverName
can we add Preconditions.checkNotNull() for all parameters in this file? Is it 
valid to ever pass null?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/main/java/org/apache/impala/authorization/ranger/RangerImpalaResourceBuilder.java@52
PS9, Line 52:     rangerAccessResource.setValue(COLUMN, columnName);
general question about ranger semantics -- are resources case sensitive? should 
we be canonicalizing to lower case anywhere? Or should we be Preconditioning 
that its' already canonicalized? Or does that get handled on the ranger side?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@90
PS9, Line 90:   private static final String RANGER_SERVICE_TYPE = "impala";
            :   private static final String RANGER_APP_ID = "impala";
can these be pulled from the config instead to avoid duplication?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java@110
PS9, Line 110: System.getenv("IMPALA_HOME") + 
"/fe/src/test/resources/sentry-site.
this is a little sketchy. Shouldn't this resource be on the classpath and thus 
just loadable by its name?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
File fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java@178
PS9, Line 178:     SentryAuthorizationFactory authzFactory = new 
SentryAuthorizationFactory(
should we be parametrizing this test to be able to run against ranger as well? 
Or is it too specific to Sentry?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
File fe/src/test/java/org/apache/impala/common/FrontendTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java@308
PS9, Line 308:           @Override
             :           public boolean isEnabled() { return true; }
             :           @Override
             :           public AuthorizationProvider getProvider() {
             :             return AuthorizationProvider.NONE;
             :           }
I'm not sure what this means, semantically -- authorization enabled but NONE?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml
File fe/src/test/resources/ranger-impala-security.xml:

http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@21
PS9, Line 21:     <value>test_impala</value>
how does this interact with the gflag for "service id"?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@37
PS9, Line 37:     <value>/tmp</value>
can we auto-configure this based on other existing scratch dir configs, at 
least as a default?


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@48
PS9, Line 48:       How often to poll for changes in policies?
are these strings copied from somewhere? It's weird to see a question mark in a 
description.


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@56
PS9, Line 56:       RangerRestClient Connection Timeout in Milli Seconds.
milliseconds


http://gerrit.cloudera.org:8080/#/c/12632/9/fe/src/test/resources/ranger-impala-security.xml@64
PS9, Line 64:       RangerRestClient read Timeout in Milli Seconds.
milliseconds


http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/bin/create-load-data.sh@308
PS9, Line 308:   perl -wpl -e 's/\$\{([^}]+)\}/defined $ENV{$1} ? $ENV{$1} : 
$&/eg' \
you don't want to die() if $ENV{$1} is not defined here? leaving it 
unsubstituted seems like it'll just make a harder-to-troubleshoot error later


http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json
File testdata/cluster/ranger/setup/impala_servicedef.json:

http://gerrit.cloudera.org:8080/#/c/12632/9/testdata/cluster/ranger/setup/impala_servicedef.json@1
PS9, Line 1: {
I'm a little curious, at a high level, how this interacts with any ranger 
polciies defined for hive. Shouldn't we have a single service definition which 
applies to both hive and impala, rather than our own service def?

Also, what's the eventual deployment model? This is in our test data, but in 
real life we'd expect these files to be deployed onto the ranger server? Maybe 
a README file in this directory would be useful.



--
To view, visit http://gerrit.cloudera.org:8080/12632
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cad9e609d20aae1ff645c84fd58a02afee70276
Gerrit-Change-Number: 12632
Gerrit-PatchSet: 9
Gerrit-Owner: Fredy Wijaya <fwij...@cloudera.com>
Gerrit-Reviewer: Austin Nobis <ano...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fwij...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <prog...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Wed, 13 Mar 2019 22:23:10 +0000
Gerrit-HasComments: Yes

Reply via email to