[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765448#comment-17765448 ] ASF GitHub Bot commented on HADOOP-18851: - Hexiaoqiao commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1720598948 Committed to trunk. Thanks @vikaskr22 for your works and @jojochuang for your review! > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use ConcurrentHashMap and remove the > "synchronized" keyword.
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765447#comment-17765447 ] ASF GitHub Bot commented on HADOOP-18851: - Hexiaoqiao merged PR #6001: URL: https://github.com/apache/hadoop/pull/6001 > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use ConcurrentHashMap and remove the > "synchronized" keyword. Because due to this, all reader threads ( to > verifyToken()) are also blocking each other. > # IN HA env, It is recom
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17765173#comment-17765173 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1719441913 @Hexiaoqiao , @jojochuang , are we good to merge this ? I can see one approval. If yes, then can anyone of you please do the needful. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use Conc
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764809#comment-17764809 ] ASF GitHub Bot commented on HADOOP-18851: - hadoop-yetus commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1718056474 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 58s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 28s | | trunk passed | | +1 :green_heart: | compile | 17m 7s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 15m 59s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 23s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 47s | | trunk passed | | +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 59s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 50s | | trunk passed | | +1 :green_heart: | shadedclient | 37m 42s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 0s | | the patch passed | | +1 :green_heart: | compile | 17m 21s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 21s | | the patch passed | | +1 :green_heart: | compile | 16m 16s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 16m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 13s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 41s | | the patch passed | | +1 :green_heart: | javadoc | 1m 13s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 44s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 39s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 38s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 5s | | The patch does not generate ASF License warnings. | | | | 230m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6001 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux a2e40121feb1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d5eded867b72eb6d5018ea63ee20bde882f21b29 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/testReport/ | | Max. process+thread count | 1278 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/7/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 http
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764678#comment-17764678 ] ASF GitHub Bot commented on HADOOP-18851: - Hexiaoqiao commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1717644318 It is OK for me. Please fix the checkstyle as Yetus reported before approve and checkin. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use ConcurrentHashMap and remove the > "synchronize
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764513#comment-17764513 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1717048265 Hi @Hexiaoqiao , @jojochuang , I have incorporated all the review comments. Now CI pipeline is failing due to lack of test cases. As this changes are logical and not changing any functional part, so I have not added any new test case. But I patched my KMS cluster with the built dev jar and executed our benchmarking test suite and it was working fine. Please let me know if this sufficient , if not, please suggest further. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} >
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764243#comment-17764243 ] ASF GitHub Bot commented on HADOOP-18851: - hadoop-yetus commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1715906168 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 3s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 57m 30s | | trunk passed | | +1 :green_heart: | compile | 18m 40s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 17m 3s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 14s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 36s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 47s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 17m 39s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 17m 39s | | the patch passed | | +1 :green_heart: | compile | 16m 45s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 16m 45s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 11s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/6/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 3 new + 27 unchanged - 0 fixed = 30 total (was 27) | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 3s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 45s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 39s | | the patch passed | | +1 :green_heart: | shadedclient | 41m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 20s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 249m 12s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6001 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux b357d2e349cb 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ab7a3a697b898ac4ec717206bf875cd01d0919f1 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/6/testReport/ | | Max. process+thread count | 3144 (vs. ulimit of 5500) | | modules
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764133#comment-17764133 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1322866544 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java: ## @@ -521,7 +521,6 @@ protected synchronized byte[] createPassword(TokenIdent identifier) { */ Review Comment: that's a good catch @jojochuang . Thanks ! It's incorporated. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: >
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764023#comment-17764023 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1322475250 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); Review Comment: @jojochuang , Yes. We did benchmarking for KMS and there We got at least 3X better throughput with `zk-dt-secret-manager.token.seqnum.batch.size=1000` . I discovered this while going through the code. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763911#comment-17763911 ] ASF GitHub Bot commented on HADOOP-18851: - jojochuang commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1322040705 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); Review Comment: Yeah a simple ReentrantLock looks like the way to go. But also > however we give batch size is 1 by default. oh i regret this. (zk-dt-secret-manager.token.seqnum.batch.size) is a configuration key not documented anywhere. How would folks discover this. @vikaskr22 did you need to update this configuration key in order to achieve the performance improvement? ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java: ## @@ -521,7 +521,6 @@ protected synchronized byte[] createPassword(TokenIdent identifier) { */ Review Comment: There is a slight chance the currentKey in line 495 and 499 may be inconsistent now that createPassword is no longer synchronized. Can we avoid that by making a local reference to currentKey to ensure within this method we refer to the same object. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apac
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763660#comment-17763660 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1321395105 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); Review Comment: @Hexiaoqiao , Actually in earlier code, most of the API level methods like verifyToken, createToken etc were marked synchronized and that was blocking the threads that are not even related. This particular method was not protected by "synchronized" keyword, it's super-class method has synchronization defined. But we were not getting any race situation because the method from where it is being called was marked synchronized. And yes, I agree that earlier approach where we had taken simple ReentrantLock at start of method execution was a simple & clean approach. I can revert back to that. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(R
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763526#comment-17763526 ] ASF GitHub Bot commented on HADOOP-18851: - Hexiaoqiao commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1320937858 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); Review Comment: @vikaskr22 Thanks for your works and sorry for the late response. After carefully review. I think it is possible of the race condition you mentioned above. And before changes, there is one `synchronized` to protect this critical section. Back to this PR, I prefer to the first version, because the performance could not improve obviously but with lock escalations. What do you think about? Let's wait if other guys would like to have some other opinions. cc @jojochuang > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curato
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763149#comment-17763149 ] ASF GitHub Bot commented on HADOOP-18851: - hadoop-yetus commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1711877383 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 59s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 47m 29s | | trunk passed | | +1 :green_heart: | compile | 18m 37s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 16m 57s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 45s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 34s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 17m 20s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 17m 20s | | the patch passed | | +1 :green_heart: | compile | 20m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 20m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 25s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 44s | | the patch passed | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 3m 9s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 57s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 52s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 242m 15s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6001 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 4f676426ecde 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 827b881ee969d1c90413980c168377b8257f70b8 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/4/testReport/ | | Max. process+thread count | 3144 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/4/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763143#comment-17763143 ] ASF GitHub Bot commented on HADOOP-18851: - hadoop-yetus commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1711866466 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 45s | | trunk passed | | +1 :green_heart: | compile | 17m 23s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 25s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 22s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 47s | | trunk passed | | +1 :green_heart: | javadoc | 1m 21s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 0s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 18s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 16m 30s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 16m 30s | | the patch passed | | +1 :green_heart: | compile | 16m 34s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 16m 34s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 19s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/5/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 3 new + 27 unchanged - 0 fixed = 30 total (was 27) | | +1 :green_heart: | mvnsite | 1m 45s | | the patch passed | | +1 :green_heart: | javadoc | 1m 19s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | -1 :x: | spotbugs | 2m 49s | [/new-spotbugs-hadoop-common-project_hadoop-common.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/5/artifact/out/new-spotbugs-hadoop-common-project_hadoop-common.html) | hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 35m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 19m 24s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/5/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 12s | | The patch does not generate ASF License warnings. | | | | 225m 50s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-common-project/hadoop-common | | | org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum() does not release lock on all paths At ZKDelegationTokenSecretManager.java:on all paths At ZKDelegationTokenSecretManager.java:[line 527] | | Failed junit tests | hadoop.security.token.delegation.TestZKDelegationTokenSecretManager | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/5/artifact/out/Dockerfile |
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763071#comment-17763071 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1319781337 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); + if (currentSeqNum >= currentMaxSeqNum) { +try { + // after a successful batch request, we can get the range starting point + currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); + currentMaxSeqNum = currentSeqNum + seqNumBatchSize; + LOG.info("Fetched new range of seq num, from {} to {} ", + currentSeqNum+1, currentMaxSeqNum); +} catch (InterruptedException e) { + // The ExpirationThread is just finishing.. so dont do anything.. + LOG.debug( + "Thread interrupted while performing token counter increment", e); + Thread.currentThread().interrupt(); +} catch (Exception e) { + throw new RuntimeException("Could not increment shared counter !!", e); +} } + + return ++currentSeqNum; + +}finally{ + this.currentSeqNumLock.unlock(); } -return ++currentSeqNum; + Review Comment: Incorporated. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnx
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763069#comment-17763069 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1319780929 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); + if (currentSeqNum >= currentMaxSeqNum) { +try { + // after a successful batch request, we can get the range starting point + currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); + currentMaxSeqNum = currentSeqNum + seqNumBatchSize; + LOG.info("Fetched new range of seq num, from {} to {} ", + currentSeqNum+1, currentMaxSeqNum); +} catch (InterruptedException e) { + // The ExpirationThread is just finishing.. so dont do anything.. + LOG.debug( + "Thread interrupted while performing token counter increment", e); + Thread.currentThread().interrupt(); +} catch (Exception e) { + throw new RuntimeException("Could not increment shared counter !!", e); +} } + + return ++currentSeqNum; + +}finally{ Review Comment: incorporated. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setDat
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763068#comment-17763068 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1319780678 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ + this.currentSeqNumLock.lock(); Review Comment: Hi @Hexiaoqiao , Can you please provide your input on this ? I have further updated the code. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.had
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763067#comment-17763067 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1319779707 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -520,24 +525,32 @@ protected int incrementDelegationTokenSeqNum() { // The secret manager will keep a local range of seq num which won't be // seen by peers, so only when the range is exhausted it will ask zk for // another range again -if (currentSeqNum >= currentMaxSeqNum) { - try { -// after a successful batch request, we can get the range starting point -currentSeqNum = incrSharedCount(delTokSeqCounter, seqNumBatchSize); -currentMaxSeqNum = currentSeqNum + seqNumBatchSize; -LOG.info("Fetched new range of seq num, from {} to {} ", -currentSeqNum+1, currentMaxSeqNum); - } catch (InterruptedException e) { -// The ExpirationThread is just finishing.. so dont do anything.. -LOG.debug( -"Thread interrupted while performing token counter increment", e); -Thread.currentThread().interrupt(); - } catch (Exception e) { -throw new RuntimeException("Could not increment shared counter !!", e); +try{ Review Comment: incorporated. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.jav
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763066#comment-17763066 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on code in PR #6001: URL: https://github.com/apache/hadoop/pull/6001#discussion_r1319778926 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/ZKDelegationTokenSecretManager.java: ## @@ -149,6 +150,9 @@ protected static CuratorFramework getCurator() { private int currentSeqNum; private int currentMaxSeqNum; + private final ReentrantLock currentSeqNumLock; + + Review Comment: incorporated. thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remainin
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760430#comment-17760430 ] ASF GitHub Bot commented on HADOOP-18851: - hadoop-yetus commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1699278372 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 58s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 15s | | trunk passed | | +1 :green_heart: | compile | 18m 45s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 17m 0s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 47s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 35s | | trunk passed | | -1 :x: | shadedclient | 38m 18s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 54s | | the patch passed | | +1 :green_heart: | compile | 17m 27s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 17m 27s | | the patch passed | | +1 :green_heart: | compile | 16m 37s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 16m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 10s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 34s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 46s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 2m 40s | | the patch passed | | -1 :x: | shadedclient | 38m 35s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 44s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 230m 13s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6001 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0fcdd0e8cfdb 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 913d6900b90d5292a41eea2cc6191ea3aef45f59 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/1/testReport/ | | Max. process+thread count | 2487 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6001/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | Th
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760340#comment-17760340 ] Vikas Kumar commented on HADOOP-18851: -- *delegationTokenSequenceNumber* : volatile will ensure visibility but if any race conditions occur like one thread is trying to increment and another trying to read, it may create inconsistencies. Same is true for *currentId.* *storeTokenTrackingId:* It can not be made final as it is being initialised in ** DelegationTokenSecretManager class as well. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760320#comment-17760320 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 commented on PR #6001: URL: https://github.com/apache/hadoop/pull/6001#issuecomment-1698903666 @jojochuang , as per our discussion on JIRA, this is the PR. Please review and provide your inputs. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verifyToken() and createPaswword() has been synchronized because one is > reading the tokenMap and another is updating the map. If it's only to protect > the map, then can't we simply use ConcurrentHashMap and remove the > "s
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760319#comment-17760319 ] ASF GitHub Bot commented on HADOOP-18851: - vikaskr22 opened a new pull request, #6001: URL: https://github.com/apache/hadoop/pull/6001 ### Description of PR AbstractDelegationTokenSecretManager's method all synchronized and are blocking each other, even multiple readers threads are blocking each other. This PR is an efforts towards optimising the synchronization contexts. For detailed description, please go through the discussion on https://issues.apache.org/jira/browse/HADOOP-18851 ### How was this patch tested? Build is working fine. And this is more about a logical change like where to acquire or release a lock. It requires careful review. There is not any functional logic change. ### For code changes: - [ Y] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ N/A] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ N/A] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ N/A] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758594#comment-17758594 ] Vikas Kumar commented on HADOOP-18851: -- Hi [~weichiu] , thanks for your review comments and input. I am going through all the variables but please help me understand following: Regarding {*}currentKey{*}: currentKey is being used to generate password. Next, in AbstractDelegationTokenSecretManager hierarchy, this password gets stored as part of tokenInfo. So during tokenVerification, this currentKey is not even being used. I get one class where they re-generate the password using this key, BlockTokenSecretManager. But here, currentKey id is stored in as part of token info and the key is stored in "allKey" map. Once a new currentkey is generated, it's ref is just being updated and object ref assignment is atomic. Coming to concurrency issues, I do see one problem of {_}visibility{_}. Like, "removerThread" updates the ref and other threads are still using the older ref. It can be solved by declaring currentKey ref as "volatile". Considering I am new to this module, Please correct me if my understanding is wrong. I am working to incorporate other review comments. Thanks. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758223#comment-17758223 ] Wei-Chiu Chuang commented on HADOOP-18851: -- I asked to make the concurrency of the extra variables even though they are not part of your patch, because this piece of code is used by YARN RM, MR job history server, YARN federation and HDFS federation. I am not familiar with all these components. But if we can get the concurrency of all member variables within AbstractDelegationTokenSecretManager we should be good and don't need to test these other components. ZKDelegationTokenSecretManager is extended by HDFS federation. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:48) > at org.apache.hadoop.security.token.Token.(Token.java:67) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.createToken(DelegationTokenManager.java:183) > {code} > We can say that this thread is slow and has blocked remaining all. But > following is my observation: > > # verif
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758208#comment-17758208 ] Wei-Chiu Chuang commented on HADOOP-18851: -- Looks mostly good. A few comments: I'm more concerned about the usage of currentKey in the method createPassword(). The currency bug will take place when the current key is updated (updateCurrentKey()). It seems to me a prudent approach is to make currentKey an atomic variable such as AtomicReference. and you want to test with very short key update interval and token life time. Nice to have: (1) the following should be made final. private long keyUpdateInterval; private long tokenMaxLifetime; private long tokenRemoverScanInterval; private long tokenRenewInterval; (2) delegationTokenSequenceNumber should be made volatile, and then we can remove synchronized from getDelegationTokenSeqNum(), incrementDelegationTokenSeqNum() and setDelegationTokenSeqNum(). The current usage can be prone to concurrency bug. (3) same for currentId. (4) storeTokenTrackingId should ideally be made a final too. If we can get all these done, I think we'll have a good concurrency implementation. > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:2235) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:398) > at > org.apache.curator.framework.imps.SetDataBuilderImpl$7.call(SetDataBuilderImpl.java:385) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:382) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:358) > at > org.apache.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:36) > at > org.apache.curator.framework.recipes.shared.SharedValue.trySetValue(SharedValue.java:201) > at > org.apache.curator.framework.recipes.shared.SharedCount.trySetCount(SharedCount.java:116) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrSharedCount(ZKDelegationTokenSecretManager.java:586) > at > org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:601) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.createPassword(AbstractDelegationTokenSecretManager.java:402) > - locked <0x0005f2f545e8> (a > org.apache.hadoop.security.token
[jira] [Commented] (HADOOP-18851) AbstractDelegationTokenSecretManager- Performance improvement by optimising the synchronization context
[ https://issues.apache.org/jira/browse/HADOOP-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755940#comment-17755940 ] Vikas Kumar commented on HADOOP-18851: -- Hello, I have made the changes and uploading the patch file here for initial review and feedback. I parallel, I am trying to do more testing. Requesting experts to please review the change. I have created branch from "trunk". Changes: 1. made {*}AbstractDelegationTokenSecretManager.createPassword() non-synchronized{*}. There is one places where it requires synchronization, incrementDelegationTokenSeqNum() method. This method is still synchronized. But in case of HA, ZKDelegationTokenSecretManagerImpl.incrementDelegationTokenSeqNum() will be invoked and this doesn't needs to be synchronized on "this" object's monitor. As part of fix for HADOOP-16828, this method returns the pre-loaded seq number and if it reaches the limit, it again reserves the range. This part requires to be protected from being executed by concurrent threads, and here I used ReentrantLock to guard this instead of this. 2. *AbstractDelegationTokenSecretManager.checkToken()* : removed the assertion for this object's monitor lock. As now all the DS has been made thread-safe. Read and update operations to the maps should be the issue now. 3. {*}Made AbstractDelegationTokenSecretManager.retrievePassword() non-synchronized{*}. Here getTokenInfo() was expected to be synchronized but since currentTokens map has been changed to thread-safe map , so this method is thared-safe. And for overriding class ZKDelegationTokenSecretManagerImpl, it reads from Zk that is also thread-safe. 4. {*}AbstractDelegationTokenSecretManager.{*}verifyToken(), this method also reads from the token map that is thread-safe. Or in case of HA, reading from ZK using CuratorFramework is thread safe. 5. ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(): protected this method using RentrantLock to avoid being accessed concurrently from multiple threads. Here intention is to make verifyToken and createPassword ( and few other methods as well) to run concurrently without blocking each other much . In current implementation, even multiple verifyToken() method (that only reads ) blocks each other due to "synchronized" nature of the method. Please let me know if I have missed anything, I would be happy to learn more. Please find below the patch: [^0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch] > AbstractDelegationTokenSecretManager- Performance improvement by optimising > the synchronization context > --- > > Key: HADOOP-18851 > URL: https://issues.apache.org/jira/browse/HADOOP-18851 > Project: Hadoop Common > Issue Type: Task >Reporter: Vikas Kumar >Assignee: Vikas Kumar >Priority: Major > Attachments: > 0001-HADOOP-18851-Perfm-improvement-for-ZKDT-management.patch, Screenshot > 2023-08-16 at 5.36.57 PM.png > > > *Context:* > KMS depends on hadoop-common for DT management. Recently we were analysing > one performance issue and following is out findings: > # Around 96% (196 out of 200) KMS container threads were in BLOCKED state at > following: > ## *AbstractDelegationTokenSecretManager.verifyToken()* > ## *AbstractDelegationTokenSecretManager.createPassword()* > # And then process crashed. > > {code:java} > http-nio-9292-exec-200PRIORITY : 5THREAD ID : 0X7F075C157800NATIVE ID : > 0X2C87FNATIVE ID (DECIMAL) : 182399STATE : BLOCKED > stackTrace: > java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.verifyToken(AbstractDelegationTokenSecretManager.java:474) > - waiting to lock <0x0005f2f545e8> (a > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager$ZKSecretManager) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenManager.verifyToken(DelegationTokenManager.java:213) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:396) > at {code} > All the 199 out of 200 were blocked at above point. > And the lock they are waiting for is acquired by a thread that was trying to > createPassword and publishing the same on ZK. > > {code:java} > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1598) > - locked <0x000749263ec0> (a org.apache.zookeeper.ClientCnxn$Packet) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1570) > at org.apache.zookeepe