[
https://issues.apache.org/jira/browse/HIVE-28977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17979349#comment-17979349
]
Zhihua Deng commented on HIVE-28977:
------------------------------------
+1 for making ExpiredTokenRemover running in the single HMS instance.
For ZooKeeperTokenStore I think we can optimize it in the future if someone hit
the issue.
> Externalize the ExpiredTokenRemover to housekeeping threads
> -----------------------------------------------------------
>
> Key: HIVE-28977
> URL: https://issues.apache.org/jira/browse/HIVE-28977
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2, Standalone Metastore
> Affects Versions: 4.1.0, 4.0.1
> Reporter: Miklos Szurap
> Assignee: Miklos Szurap
> Priority: Major
> Labels: cleanup, delegationtoken, maintenance
>
> In many deployments there are multiple HS2 and HMS instances, and the
> "hive.cluster.delegation.token.store.class" is configured to
> "org.apache.hadoop.hive.thrift.DBTokenStore" which stores the DTs in the HMS,
> at the end in the "DELEGATION_TOKENS" table.
> Currently (master / d6bcdf652d) the implementation of the token cleanup is
> very inefficient:
> - All the HS2 and HMS instances start the DT cleanup thread, see "Starting
> expired delegation token remover thread" in the logs.
> - This "ExpiredTokenRemover" thread actually renews the tokens (if not
> expired), or removes them (if expired). This is fine. However it first
> fetches ALL the delegation tokens (one
> "tokenStore.getAllDelegationTokenIdentifiers()" call), and then iterates
> through them to get their details (many "tokenStore.getToken(id)" calls). As
> this is also done from the HS2 side, this creates lots of "remote" calls to
> the HMS, which is very inefficient.
> We should optimize this and do it in the Metastore's housekeeping threads.
> Ideally only one HMS is a leader (dynamic leader election) so it could be
> done from a single place.
> Note that there can be many thousands of DTs stored in the DB depending on
> the token lifetime configurations and usage patterns, we could spare lots of
> cycles with this.
> Which DT stores are affected?
> - The MemoryTokenStore should be untouched, as it is indeed a "per instance"
> store and the cleanup should run everywhere.
> - The ZooKeeperTokenStore can be individually configured
> ("hive.cluster.delegation.token.store.zookeeper.znode"), so it is not safe to
> do it from a single place
> - As such only the DBTokenStore can be optimized like this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)