[ 
https://issues.apache.org/jira/browse/IMPALA-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18054727#comment-18054727
 ] 

ASF subversion and git services commented on IMPALA-14703:
----------------------------------------------------------

Commit 3dac0135fba0717dd977043e7ecc6b52bf55189f in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3dac0135f ]

IMPALA-14703: Improves finding column masking policies of a table

When Ranger authorization is enabled, if a user wants to update masked
data, it should be blocked. This is done by checking if there are any
column masking or row filtering policies on the table enabled for the
user. Currently we iterate all the columns of the table and check if a
masking policy exists on the column. This is inefficent, especially for
wide tables. It also requires metadata of the table is loaded to get the
column list, which introduces a performance regression for INVALIDATE
and REFRESH statements that previously don't trigger metadata loading.

This patch improves the check to make the request on table level with a
resource matching scope of SELF_OR_DESCENDANTS. By using this, ranger
plugin will return the first matching column masking policy in
evalDataMaskPolicies().

As we don't need the column list now, table loading triggered by
INVALIDATE and REFRESH statements are also removed.

Tests
 - Ran test_block_metadata_update and data masking tests in
   test_ranger.py

Change-Id: Ic8ab88b7cfd4f7e156c4eead53a2ff3086b1daa6
Reviewed-on: http://gerrit.cloudera.org:8080/23908
Reviewed-by: Csaba Ringhofer <[email protected]>
Reviewed-by: Fang-Yu Rao <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Improves finding column masking policies of a table for a user
> --------------------------------------------------------------
>
>                 Key: IMPALA-14703
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14703
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend, Security
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> When Ranger authorization is enabled, to block updates on masked data, we 
> check column masking policies on each column of a table:
> {code:java}
> for (String column : columns) {
>   RangerAccessResult columnMaskResult = evalColumnMask(user,
>       authorizable.getDbName(), authorizable.getTableName(), column,
>       /*auditHandler*/null);
>   if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
>     LOG.trace("Deny {} on {} due to column masking policy {}",
>         privilege, authorizable.getName(), columnMaskResult.getPolicyId());
>     accessResult.setIsAllowed(false);
>     accessResult.setPolicyId(columnMaskResult.getPolicyId());
>     accessResult.setReason("User does not have access to unmasked column 
> values");
>     break;{code}
> [https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747]
> This is inefficient for wide tables. It also requires table metadata is 
> loaded to get the correct column list (IMPALA-11281), which introduces a 
> performance regression for INVALIDATE and REFRESH on unloaded tables. See 
> IMPALA-11501.
> We can consider using table level request with scope 
> RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any 
> column masking policies. This is the implementation in Hive:
> {code:java}
> RangerHiveResource tblResource     = new 
> RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(), 
> resource.getTable());
> request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking 
> policies are defined only for SELECT
> request.setResource(tblResource);
> ...
> // check if masking is enabled for any column in the table/view
> request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
> [https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028]
> With this, INVALIDATE and REFRESH don't need to trigger metadata loading on 
> unloaded tables to get the column list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to