[
https://issues.apache.org/jira/browse/IMPALA-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770339#comment-17770339
]
Quanlong Huang edited comment on IMPALA-11501 at 1/27/26 5:43 AM:
------------------------------------------------------------------
Another thing this Jira should take care of is not triggering metadata loading
for INVALIDATE METADATA <table> statement when "allow_refresh_by_masked_users"
is set to true.
Before we have IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table>
doesn't trigger metadata loading no matter in the legacy catalog mode or in the
local catalog mode. It is supposed to finish fast and won't be blocked by
concurrent DDLs.
After we have IMPALA-10554, the authorization check on the INVALIDATE/REFRESH
request will get the column info, which will trigger metadata loading in local
catalog mode. Code snipper:
{code:java}
private void authorizePrivilegeRequest(AuthorizationContext authzCtx,
AnalysisResult analysisResult, FeCatalog catalog, PrivilegeRequest request)
throws AuthorizationException, InternalException {
Preconditions.checkNotNull(request);
String dbName = null;
if (request.getAuthorizable() != null) {
dbName = request.getAuthorizable().getDbName();
}
// If this is a system database, some actions should always be allowed
// or disabled, regardless of what is in the auth policy.
if (dbName != null && checkSystemDbAccess(catalog, dbName,
request.getPrivilege())) {
return;
}
// Populate column names to check column masking policies in blocking updates.
if (config_.isEnabled() && request.getAuthorizable() != null
&& request.getAuthorizable().getType() == Type.TABLE) {
Preconditions.checkNotNull(dbName);
AuthorizableTable authorizableTable = (AuthorizableTable)
request.getAuthorizable();
FeDb db = catalog.getDb(dbName);
if (db != null) {
// 'db', 'table' could be null for an unresolved table ref. 'table' could
be
// null for target table of a CTAS statement. Don't need to populate
column
// names in such cases since no column masking policies will be checked.
FeTable table = db.getTable(authorizableTable.getTableName()); // <----
This will trigger metadata loading in local catalog mode
if (table != null && !(table instanceof FeIncompleteTable)) {
authorizableTable.setColumns(table.getColumnNames());
}
}
}
checkAccess(authzCtx, analysisResult.getAnalyzer().getUser(), request);
}{code}
[https://github.com/apache/impala/blob/2baed42/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L226]
In local catalog mode, if the table meta is not cached locally, the call on
db.getTable() sends a getPartialCatalogObject request to catalogd, which could
be blocked in two places:
* if the table is also unloaded in catalogd, it triggers metadata loading and
should wait for that.
* if the table is locked by a concurrent DDL/DML, it should wait since it
requires the table read lock.
These make INVALIDATE METADATA runs slow on large tables or tables that have
frequent DDL/DMLs.
In the legacy catalog mode, db.getTable() just returns what is in the cache.
For an unloaded table, it returns an IncompleteTable object, which has no
column info and leads to the bug described inĀ IMPALA-11281. IMPALA-11281 fixes
the bug by forcing a metadata loading for INVALIDATE/REFRESH commands.
So for branches that have IMPALA-10554, INVALIDATE METADATA <table> could be
blocked in the above two places in local catalog mode. For branches that have
both IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> could be
blocked in both catalog modes.
Introducing the flag of "allow_catalog_cache_op_from_masked_users" is to bring
back the behavior of INVALIDATE/REFRESH before we have IMPALA-10554. It should
also bring back the same performance on INVALIDATE.
was (Author: stiga-huang):
Another thing this Jira should take care of is not triggering metadata loading
for INVALIDATE METADATA <table> statement when "allow_refresh_by_masked_users"
is set to true.
Before we have IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table>
doesn't trigger metadata loading no matter in the legacy catalog mode or in the
local catalog mode. It is supposed to finish fast and won't be blocked by
concurrent DDLs.
After we have IMPALA-10554, the authorization check on the INVALIDATE/REFRESH
request will get the column info, which will trigger metadata loading in local
catalog mode. Code snipper:
{code:java}
private void authorizePrivilegeRequest(AuthorizationContext authzCtx,
AnalysisResult analysisResult, FeCatalog catalog, PrivilegeRequest request)
throws AuthorizationException, InternalException {
Preconditions.checkNotNull(request);
String dbName = null;
if (request.getAuthorizable() != null) {
dbName = request.getAuthorizable().getDbName();
}
// If this is a system database, some actions should always be allowed
// or disabled, regardless of what is in the auth policy.
if (dbName != null && checkSystemDbAccess(catalog, dbName,
request.getPrivilege())) {
return;
}
// Populate column names to check column masking policies in blocking updates.
if (config_.isEnabled() && request.getAuthorizable() != null
&& request.getAuthorizable().getType() == Type.TABLE) {
Preconditions.checkNotNull(dbName);
AuthorizableTable authorizableTable = (AuthorizableTable)
request.getAuthorizable();
FeDb db = catalog.getDb(dbName);
if (db != null) {
// 'db', 'table' could be null for an unresolved table ref. 'table' could
be
// null for target table of a CTAS statement. Don't need to populate
column
// names in such cases since no column masking policies will be checked.
FeTable table = db.getTable(authorizableTable.getTableName()); // <----
This will trigger metadata loading in local catalog mode
if (table != null && !(table instanceof FeIncompleteTable)) {
authorizableTable.setColumns(table.getColumnNames());
}
}
}
checkAccess(authzCtx, analysisResult.getAnalyzer().getUser(), request);
}{code}
[https://github.com/apache/impala/blob/2baed42/fe/src/main/java/org/apache/impala/authorization/BaseAuthorizationChecker.java#L226]
In local catalog mode, if the table meta is not cached locally, the call on
db.getTable() sends a getPartialCatalogObject request to catalogd, which could
be blocked in two places:
* if the table is also unloaded in catalogd, it triggers metadata loading and
should wait for that.
* if the table is locked by a concurrent DDL/DML, it should wait since it
requires the table read lock.
These make INVALIDATE METADATA runs slow on large tables or tables that have
frequent DDL/DMLs.
In the legacy catalog mode, db.getTable() just returns what is in the cache.
For an unloaded table, it returns an IncompleteTable object, which has no
column info and leads to the bug described inĀ IMPALA-11281. IMPALA-11281 fixes
the bug by forcing a metadata loading for INVALIDATE/REFRESH commands.
So for branches that have IMPALA-10554, INVALIDATE METADATA <table> could be
blocked in the above two places in local catalog mode. For branches that have
both IMPALA-10554 and IMPALA-11281, INVALIDATE METADATA <table> could be
blocked in both catalog modes.
Introducing the flag of "allow_refresh_by_masked_users" is to bring back the
behavior of INVALIDATE/REFRESH before we have IMPALA-10554. We should also
bring back the same performance on INVALIDATE.
> Add flag to allow metadata-cache operations on masked tables
> ------------------------------------------------------------
>
> Key: IMPALA-11501
> URL: https://issues.apache.org/jira/browse/IMPALA-11501
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog, Security
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Fix For: Impala 4.4.0
>
>
> "REFRESH <table>" and "INVALIDATE METADATA <table>" are the table level
> metadata-cache operations that only used in Impala (not Hive, SparkSQL or
> else).
> In Hive-Ranger plugin, when a table is masked (either by column-masking or
> row-filtering policy) for a user, the user can't perform any modification
> (insert/delete/update) on the table (RANGER-1087, RANGER-1100). However, Hive
> doesn't have those metadata-cache operations. It's a grey area whether we
> should block them or not.
> Currently, Impala blocks metadata-cache operations as well (IMPALA-10554,
> IMPALA-11281). However, it's possible that, before upgrade, some
> data-consumer jobs already have REFRESH in them. It'd be better to have a
> flag to allow such operations for smooth upgrade process.
> The flag can be something like "allow_refresh_by_masked_users".
> CC [~fangyurao], [~csringhofer]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]