[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845178#comment-17845178 ]
ASF GitHub Bot commented on HADOOP-18851: ----------------------------------------- ChenSammi commented on PR #6803: URL: https://github.com/apache/hadoop/pull/6803#issuecomment-2103782158 > @ChenSammi , Please add your input as well in case I missed anything. Thanks. Yes, removing the synchronized directly have potential risks to sub classes of AbstractDelegationTokenSecretManager. Using LOCK is a safe way to replace synchronized and can improve the concurrency too. That's the motivation of this implementation. @Hexiaoqiao . > Performance improvement for DelegationTokenSecretManager. > --------------------------------------------------------- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task > Components: common > Affects Versions: 3.4.0 > Reporter: Vikas Kumar > Assignee: Vikas Kumar > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X00007F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x00000005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x0000000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x00000005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.<init>(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use ConcurrentHashMap and remove the > "synchronized" keyword. Because due to this, all reader threads ( to > verifyToken()) are also blocking each other. > # IN HA env, It is recommended to use ZK to store DTs. We know that > CuratorFramework is thread safe. > ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum() only requires > to be protected from concurrent execution and it should be protected using > some other locks instead of "this". > # With these changes, verifyToken() and createPaswword() will not block each > other. It will be blocked only at the time of updating the map. > # Similarly other methods can also be considered but these two are critical. > I made these changes on my local and got the significant performance > improvement. > I request community to provide their input and if we agree, I can provide the > patch. Please let me know if any other details are required. > Thanks. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org