[ 
https://issues.apache.org/jira/browse/IMPALA-14703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-14703:
------------------------------------
    Description: 
When Ranger authorization is enabled, to block updates on masked data, we check 
column masking policies on each column of a table:
{code:java}
for (String column : columns) {
  RangerAccessResult columnMaskResult = evalColumnMask(user,
      authorizable.getDbName(), authorizable.getTableName(), column,
      /*auditHandler*/null);
  if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
    LOG.trace("Deny {} on {} due to column masking policy {}",
        privilege, authorizable.getName(), columnMaskResult.getPolicyId());
    accessResult.setIsAllowed(false);
    accessResult.setPolicyId(columnMaskResult.getPolicyId());
    accessResult.setReason("User does not have access to unmasked column 
values");
    break;{code}
[https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747]

This is inefficient for wide tables. It also requires table metadata is loaded 
to get the correct column list (IMPALA-11281), which introduces a performance 
regression for INVALIDATE and REFRESH on unloaded tables. See IMPALA-11501.

We can consider using table level request with scope 
RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any 
column masking policies. This is the implementation in Hive:
{code:java}
RangerHiveResource tblResource     = new 
RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(), 
resource.getTable());
request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking policies 
are defined only for SELECT
request.setResource(tblResource);
...
// check if masking is enabled for any column in the table/view
request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
[https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028]

With this, INVALIDATE and REFRESH don't need to trigger metadata loading on 
unloaded tables to get the column list.

  was:
When Ranger authorization is enabled, to block updates on masked data, we check 
column masking policies on each column of a table:
{code:java}
for (String column : columns) {
  RangerAccessResult columnMaskResult = evalColumnMask(user,
      authorizable.getDbName(), authorizable.getTableName(), column,
      /*auditHandler*/null);
  if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
    LOG.trace("Deny {} on {} due to column masking policy {}",
        privilege, authorizable.getName(), columnMaskResult.getPolicyId());
    accessResult.setIsAllowed(false);
    accessResult.setPolicyId(columnMaskResult.getPolicyId());
    accessResult.setReason("User does not have access to unmasked column 
values");
    break;{code}
https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747

This is inefficient for wide tables. It also requires table metadata is loaded 
to get the correct column list (IMPALA-11281), which introduces a performance 
regression for INVALIDATE and REFRESH on unloaded tables. See IMPALA-11501.

We can consider using table level request with scope 
RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any 
column masking policies. This is the implementation in Hive:
{code:java}
RangerHiveResource tblResource     = new 
RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(), 
resource.getTable());
request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking policies 
are defined only for SELECT
request.setResource(tblResource);
...
// check if masking is enabled for any column in the table/view
request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028


> Improves finding column masking policies of a table for a user
> --------------------------------------------------------------
>
>                 Key: IMPALA-14703
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14703
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend, Security
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> When Ranger authorization is enabled, to block updates on masked data, we 
> check column masking policies on each column of a table:
> {code:java}
> for (String column : columns) {
>   RangerAccessResult columnMaskResult = evalColumnMask(user,
>       authorizable.getDbName(), authorizable.getTableName(), column,
>       /*auditHandler*/null);
>   if (columnMaskResult != null && columnMaskResult.isMaskEnabled()) {
>     LOG.trace("Deny {} on {} due to column masking policy {}",
>         privilege, authorizable.getName(), columnMaskResult.getPolicyId());
>     accessResult.setIsAllowed(false);
>     accessResult.setPolicyId(columnMaskResult.getPolicyId());
>     accessResult.setReason("User does not have access to unmasked column 
> values");
>     break;{code}
> [https://github.com/apache/impala/blob/00c233cc4fc25d23fc8a7e2f1efdf2d85c29f653/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L737-L747]
> This is inefficient for wide tables. It also requires table metadata is 
> loaded to get the correct column list (IMPALA-11281), which introduces a 
> performance regression for INVALIDATE and REFRESH on unloaded tables. See 
> IMPALA-11501.
> We can consider using table level request with scope 
> RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS to find any 
> column masking policies. This is the implementation in Hive:
> {code:java}
> RangerHiveResource tblResource     = new 
> RangerHiveResource(HiveObjectType.TABLE, resource.getDatabase(), 
> resource.getTable());
> request.setHiveAccessType(HiveAccessType.SELECT); // filtering/masking 
> policies are defined only for SELECT
> request.setResource(tblResource);
> ...
> // check if masking is enabled for any column in the table/view
> request.setResourceMatchingScope(RangerAccessRequest.ResourceMatchingScope.SELF_OR_DESCENDANTS);{code}
> [https://github.com/apache/ranger/blob/d48e3528eb0d5dca965e53bb4a75f18f3b2d24a2/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L1028]
> With this, INVALIDATE and REFRESH don't need to trigger metadata loading on 
> unloaded tables to get the column list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to