[
https://issues.apache.org/jira/browse/IMPALA-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-14703:
------------------------------------
Description:
When Ranger authorization is enabled, to block updates on masked data, we check
column masking policies on each column of a table:
{code:java}
for (String column : columns) {
RangerAccessResult columnMaskResult = evalColumnMask(user,
authorizable.getDbName(), authorizable.getTableName(), column,
/*auditHandler*/null);
if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
LOG.trace("Deny {} on {} due to column masking policy {}",
privilege, authorizable.getName(), columnMaskResult.getPolicyId());
accessResult.setIsAllowed(false);
accessResult.setPolicyId(columnMaskResult.getPolicyId());
accessResult.setReason("User does not have access to unmasked column
values");
break;{code}
[https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747]
This is inefficient for wide tables. It also requires table metadata is loaded
to get the correct column list (IMPALA-11281), which introduces a performance
regression for INVALIDATE and REFRESH on unloaded tables. See IMPALA-11501.
We can consider using table level request with scope
RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any
column masking policies. This is the implementation in Hive:
{code:java}
RangerHiveResource tblResource = new
RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(),
resource.getTable());
request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking policies
are defined only for SELECT
request.setResource(tblResource);
...
// check if masking is enabled for any column in the table/view
request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
[https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028]
With this, INVALIDATE and REFRESH don't need to trigger metadata loading on
unloaded tables to get the column list.
was:
When Ranger authorization is enabled, to block updates on masked data, we check
column masking policies on each column of a table:
{code:java}
for (String column : columns) {
RangerAccessResult columnMaskResult = evalColumnMask(user,
authorizable.getDbName(), authorizable.getTableName(), column,
/*auditHandler*/null);
if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
LOG.trace("Deny {} on {} due to column masking policy {}",
privilege, authorizable.getName(), columnMaskResult.getPolicyId());
accessResult.setIsAllowed(false);
accessResult.setPolicyId(columnMaskResult.getPolicyId());
accessResult.setReason("User does not have access to unmasked column
values");
break;{code}
https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747
This is inefficient for wide tables. It also requires table metadata is loaded
to get the correct column list (IMPALA-11281), which introduces a performance
regression for INVALIDATE and REFRESH on unloaded tables. See IMPALA-11501.
We can consider using table level request with scope
RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any
column masking policies. This is the implementation in Hive:
{code:java}
RangerHiveResource tblResource = new
RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(),
resource.getTable());
request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking policies
are defined only for SELECT
request.setResource(tblResource);
...
// check if masking is enabled for any column in the table/view
request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028
> Improves finding column masking policies of a table for a user
> --------------------------------------------------------------
>
> Key: IMPALA-14703
> URL: https://issues.apache.org/jira/browse/IMPALA-14703
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend, Security
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> When Ranger authorization is enabled, to block updates on masked data, we
> check column masking policies on each column of a table:
> {code:java}
> for (String column : columns) {
> RangerAccessResult columnMaskResult = evalColumnMask(user,
> authorizable.getDbName(), authorizable.getTableName(), column,
> /*auditHandler*/null);
> if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
> LOG.trace("Deny {} on {} due to column masking policy {}",
> privilege, authorizable.getName(), columnMaskResult.getPolicyId());
> accessResult.setIsAllowed(false);
> accessResult.setPolicyId(columnMaskResult.getPolicyId());
> accessResult.setReason("User does not have access to unmasked column
> values");
> break;{code}
> [https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747]
> This is inefficient for wide tables. It also requires table metadata is
> loaded to get the correct column list (IMPALA-11281), which introduces a
> performance regression for INVALIDATE and REFRESH on unloaded tables. See
> IMPALA-11501.
> We can consider using table level request with scope
> RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any
> column masking policies. This is the implementation in Hive:
> {code:java}
> RangerHiveResource tblResource = new
> RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(),
> resource.getTable());
> request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking
> policies are defined only for SELECT
> request.setResource(tblResource);
> ...
> // check if masking is enabled for any column in the table/view
> request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
> [https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028]
> With this, INVALIDATE and REFRESH don't need to trigger metadata loading on
> unloaded tables to get the column list.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]