[ 
https://issues.apache.org/jira/browse/IMPALA-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770339#comment-17770339
 ] 

Quanlong Huang edited comment on IMPALA-11501 at 1/27/26 5:43 AM:
------------------------------------------------------------------

Another thing this Jira should take care of is not triggering metadata loading 
for INVALIDATE METADATA <table> statement when "allow_refresh_by_masked_users" 
is set to true.

Before we have IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> 
doesn't trigger metadata loading no matter in the legacy catalog mode or in the 
local catalog mode. It is supposed to finish fast and won't be blocked by 
concurrent DDLs.

After we have IMPALA-10554, the authorization check on the INVALIDATE/REFRESH 
request will get the column info, which will trigger metadata loading in local 
catalog mode. Code snipper:
{code:java}
private void authorizePrivilegeRequest(AuthorizationContext authzCtx,
    AnalysisResult analysisResult, FeCatalog catalog, PrivilegeRequest request)
    throws AuthorizationException, InternalException {
  Preconditions.checkNotNull(request);
  String dbName = null;
  if (request.getAuthorizable() != null) {
    dbName = request.getAuthorizable().getDbName();
  }
  // If this is a system database, some actions should always be allowed
  // or disabled, regardless of what is in the auth policy.
  if (dbName != null && checkSystemDbAccess(catalog, dbName, 
request.getPrivilege())) {
    return;
  }
  // Populate column names to check column masking policies in blocking updates.
  if (config_.isEnabled() && request.getAuthorizable() != null
      && request.getAuthorizable().getType() == Type.TABLE) {
    Preconditions.checkNotNull(dbName);
    AuthorizableTable authorizableTable = (AuthorizableTable) 
request.getAuthorizable();
    FeDb db = catalog.getDb(dbName);
    if (db != null) {
      // 'db', 'table' could be null for an unresolved table ref. 'table' could 
be
      // null for target table of a CTAS statement. Don't need to populate 
column
      // names in such cases since no column masking policies will be checked.
      FeTable table = db.getTable(authorizableTable.getTableName());  // <---- 
This will trigger metadata loading in local catalog mode
      if (table != null && !(table instanceof FeIncompleteTable)) {
        authorizableTable.setColumns(table.getColumnNames());
      }
    }
  }
  checkAccess(authzCtx, analysisResult.getAnalyzer().getUser(), request);
}{code}
[https://github.com/apache/impala/blob/2baed42/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L226]

In local catalog mode, if the table meta is not cached locally, the call on 
db.getTable() sends a getPartialCatalogObject request to catalogd, which could 
be blocked in two places:
 * if the table is also unloaded in catalogd, it triggers metadata loading and 
should wait for that.
 * if the table is locked by a concurrent DDL/DML, it should wait since it 
requires the table read lock.

These make INVALIDATE METADATA runs slow on large tables or tables that have 
frequent DDL/DMLs.

In the legacy catalog mode, db.getTable() just returns what is in the cache. 
For an unloaded table, it returns an IncompleteTable object, which has no 
column info and leads to the bug described inĀ IMPALA-11281. IMPALA-11281 fixes 
the bug by forcing a metadata loading for INVALIDATE/REFRESH commands.

So for branches that have IMPALA-10554, INVALIDATE METADATA <table> could be 
blocked in the above two places in local catalog mode. For branches that have 
both IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> could be 
blocked in both catalog modes.

Introducing the flag of "allow_catalog_cache_op_from_masked_users" is to bring 
back the behavior of INVALIDATE/REFRESH before we have IMPALA-10554. It should 
also bring back the same performance on INVALIDATE.


was (Author: stiga-huang):
Another thing this Jira should take care of is not triggering metadata loading 
for INVALIDATE METADATA <table> statement when "allow_refresh_by_masked_users" 
is set to true.

Before we have IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> 
doesn't trigger metadata loading no matter in the legacy catalog mode or in the 
local catalog mode. It is supposed to finish fast and won't be blocked by 
concurrent DDLs.

After we have IMPALA-10554, the authorization check on the INVALIDATE/REFRESH 
request will get the column info, which will trigger metadata loading in local 
catalog mode. Code snipper:
{code:java}
private void authorizePrivilegeRequest(AuthorizationContext authzCtx,
    AnalysisResult analysisResult, FeCatalog catalog, PrivilegeRequest request)
    throws AuthorizationException, InternalException {
  Preconditions.checkNotNull(request);
  String dbName = null;
  if (request.getAuthorizable() != null) {
    dbName = request.getAuthorizable().getDbName();
  }
  // If this is a system database, some actions should always be allowed
  // or disabled, regardless of what is in the auth policy.
  if (dbName != null && checkSystemDbAccess(catalog, dbName, 
request.getPrivilege())) {
    return;
  }
  // Populate column names to check column masking policies in blocking updates.
  if (config_.isEnabled() && request.getAuthorizable() != null
      && request.getAuthorizable().getType() == Type.TABLE) {
    Preconditions.checkNotNull(dbName);
    AuthorizableTable authorizableTable = (AuthorizableTable) 
request.getAuthorizable();
    FeDb db = catalog.getDb(dbName);
    if (db != null) {
      // 'db', 'table' could be null for an unresolved table ref. 'table' could 
be
      // null for target table of a CTAS statement. Don't need to populate 
column
      // names in such cases since no column masking policies will be checked.
      FeTable table = db.getTable(authorizableTable.getTableName());  // <---- 
This will trigger metadata loading in local catalog mode
      if (table != null && !(table instanceof FeIncompleteTable)) {
        authorizableTable.setColumns(table.getColumnNames());
      }
    }
  }
  checkAccess(authzCtx, analysisResult.getAnalyzer().getUser(), request);
}{code}
[https://github.com/apache/impala/blob/2baed42/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L226]

In local catalog mode, if the table meta is not cached locally, the call on 
db.getTable() sends a getPartialCatalogObject request to catalogd, which could 
be blocked in two places:
 * if the table is also unloaded in catalogd, it triggers metadata loading and 
should wait for that.
 * if the table is locked by a concurrent DDL/DML, it should wait since it 
requires the table read lock.

These make INVALIDATE METADATA runs slow on large tables or tables that have 
frequent DDL/DMLs.

In the legacy catalog mode, db.getTable() just returns what is in the cache. 
For an unloaded table, it returns an IncompleteTable object, which has no 
column info and leads to the bug described inĀ IMPALA-11281. IMPALA-11281 fixes 
the bug by forcing a metadata loading for INVALIDATE/REFRESH commands.

So for branches that have IMPALA-10554, INVALIDATE METADATA <table> could be 
blocked in the above two places in local catalog mode. For branches that have 
both IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> could be 
blocked in both catalog modes.

Introducing the flag of "allow_refresh_by_masked_users" is to bring back the 
behavior of INVALIDATE/REFRESH before we have IMPALA-10554. We should also 
bring back the same performance on INVALIDATE.

> Add flag to allow metadata-cache operations on masked tables
> ------------------------------------------------------------
>
>                 Key: IMPALA-11501
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11501
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog, Security
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.4.0
>
>
> "REFRESH <table>" and "INVALIDATE METADATA <table>" are the table level 
> metadata-cache operations that only used in Impala (not Hive, SparkSQL or 
> else).
> In Hive-Ranger plugin, when a table is masked (either by column-masking or 
> row-filtering policy) for a user, the user can't perform any modification 
> (insert/delete/update) on the table (RANGER-1087, RANGER-1100). However, Hive 
> doesn't have those metadata-cache operations. It's a grey area whether we 
> should block them or not.
> Currently, Impala blocks metadata-cache operations as well (IMPALA-10554, 
> IMPALA-11281). However, it's possible that, before upgrade, some 
> data-consumer jobs already have REFRESH in them. It'd be better to have a 
> flag to allow such operations for smooth upgrade process.
> The flag can be something like "allow_refresh_by_masked_users".
> CC [~fangyurao], [~csringhofer]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to