[
https://issues.apache.org/jira/browse/HDFS-16455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038725#comment-18038725
]
ASF GitHub Bot commented on HDFS-16455:
---------------------------------------
github-actions[bot] closed pull request #3983: HDFS-16455. RBF: Add
`zk-dt-secret-manager.jute.maxbuffer` property for Router's
ZKDelegationTokenManager
URL: https://github.com/apache/hadoop/pull/3983
> RBF: Router should explicitly specify the value of `jute.maxbuffer` in hadoop
> configuration files like core-site.xml
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-16455
> URL: https://issues.apache.org/jira/browse/HDFS-16455
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: rbf
> Affects Versions: 3.3.0, 3.4.0
> Reporter: Max Xie
> Assignee: Max Xie
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> Based on the current design for delegation token in secure Router, the total
> number of tokens store and update in zookeeper using
> ZKDelegationTokenManager.
> But the default value of system property `jute.maxbuffer` is just 4MB, if
> Router store too many tokens in zk, it will throw IOException `{{{}Packet
> lenxx is out of range{}}}` and all Router will crash.
>
> In our cluster, Routers crashed because of it. The crash logs are below
> {code:java}
> 2022-02-09 02:15:51,607 INFO
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
> Token renewal for identifier: (token for xxx: HDFS_DELEGATION_TOKEN
> owner=xxx/scheduler, renewer=hadoop, realUser=, issueDate=1644344146305,
> maxDate=1644948946305, sequenceNumbe
> r=27136070, masterKeyId=1107); total currentTokens 279548 2022-02-09
> 02:16:07,632 WARN org.apache.zookeeper.ClientCnxn: Session 0x1000172775a0012
> for server zkurl:2181, unexpected e
> rror, closing socket connection and attempting reconnect
> java.io.IOException: Packet len4194553 is out of range!
> at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
> at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
> 2022-02-09 02:16:07,733 WARN org.apache.hadoop.ipc.Server: IPC Server handler
> 1254 on default port 9001, call Call#144 Retry#0
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getDelegationToken from
> ip:46534
> java.lang.RuntimeException: Could not increment shared counter !!
> at
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager.incrementDelegationTokenSeqNum(ZKDelegationTokenSecretManager.java:582)
> {code}
> When we restart a Router, it crashed again
> {code:java}
> 2022-02-09 03:14:17,308 INFO
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager:
> Starting to load key cache.
> 2022-02-09 03:14:17,310 INFO
> org.apache.hadoop.security.token.delegation.ZKDelegationTokenSecretManager:
> Loaded key cache.
> 2022-02-09 03:14:32,930 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x205584be35b0001 for server zkurl:2181, unexpected
> error, closing socket connection and attempting reconnect
> java.io.IOException: Packet len4194478 is out of range!
> at
> org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:113)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
> at
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
> at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
> 2022-02-09 03:14:33,030 ERROR
> org.apache.hadoop.hdfs.server.federation.router.security.token.ZKDelegationTokenSecretManagerImpl:
> Error starting threads for z
> kDelegationTokens
> java.io.IOException: Could not start PathChildrenCache for tokens {code}
> Finnally, we config `-Djute.maxbuffer=10000000` in hadoop-env,sh to fix this
> issue.
> After dig it, we found the number of the znode
> `/ZKDTSMRoot/ZKDTSMTokensRoot`'s children node was more than 250000, which's
> data size was over 4MB.
>
> Maybe we should explicitly specify the value of `jute.maxbuffer` in hadoop
> configuration files like core-site.xml, hdfs-rbf-site.xml to configure a
> larger value
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]