[ https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748729#comment-17748729 ]
ASF GitHub Bot commented on HDFS-17128: --------------------------------------- hchaverri commented on code in PR #5897: URL: https://github.com/apache/hadoop/pull/5897#discussion_r1278035871 ########## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/SQLDelegationTokenSecretManager.java: ########## @@ -46,6 +50,9 @@ public abstract class SQLDelegationTokenSecretManager<TokenIdent private static final String SQL_DTSM_TOKEN_SEQNUM_BATCH_SIZE = SQL_DTSM_CONF_PREFIX + "token.seqnum.batch.size"; public static final int DEFAULT_SEQ_NUM_BATCH_SIZE = 10; + public static final String SQL_DTSM_TOKEN_LOADING_CACHE_EXPIRATION_MS = SQL_DTSM_CONF_PREFIX + + "token.loading.cache.expiration.ms"; + public static final int SQL_DTSM_TOKEN_LOADING_CACHE_EXPIRATION_DEFAULT_MS = 10000; Review Comment: The concern is not only with deleting tokens from SQL that have been renewed, but also with tokens being stale in the cache. If a router has a stale token that has been renewed somewhere else, the router will throw an authentication error since the token in memory is expired. We should keep this value low enough so the renewal is propagated to all routers quickly enough. Same reason for cancellations. I can see us tweaking this value to balance the impact on SQL. The removal scan won't actually help much with token cleanup as we only expect 10 seconds of tokens to be candidates for cleanup. We need a separate mechanism to query SQL for all expired tokens and delete them, but I think we should track that separately. > RBF: SQLDelegationTokenSecretManager should use version of tokens updated by > other routers > ------------------------------------------------------------------------------------------ > > Key: HDFS-17128 > URL: https://issues.apache.org/jira/browse/HDFS-17128 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf > Reporter: Hector Sandoval Chaverri > Priority: Major > Labels: pull-request-available > > The SQLDelegationTokenSecretManager keeps tokens that it has interacted with > in a memory cache. This prevents routers from connecting to the SQL server > for each token operation, improving performance. > We've noticed issues with some tokens being loaded in one router's cache and > later renewed on a different one. If clients try to use the token in the > outdated router, it will throw an "Auth failed" error when the cached token's > expiration has passed. > This can also affect cancelation scenarios since a token can be removed from > one router's cache and still exist in another one. > A possible solution is already implemented on the > ZKDelegationTokenSecretManager, which consists of having an executor > refreshing each router's cache on a periodic basis. We should evaluate > whether this will work with the volume of tokens expected to be handled by > the SQLDelegationTokenSecretManager. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org