Rohan Nimmagadda created HDFS-17340:
---------------------------------------
Summary: transaction lag issue between aNN and oNN causing
HDFS_DELEGATION_TOKEN can't be found in cache in oNN
Key: HDFS-17340
URL: https://issues.apache.org/jira/browse/HDFS-17340
Project: Hadoop HDFS
Issue Type: Bug
Components: dfsclient, hdfs, namenode
Affects Versions: 3.3.3
Reporter: Rohan Nimmagadda
We experienced a transaction lag issue between aNN and oNN, causing problems in
busier clusters. When HDFS_DELEGATION_TOKEN is created by aNN, the oNN couldn't
catch up cache location immediately, leading to the issue of the token not
being found in the cache in oNN.
We followed the document
[[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]]
to enable oNN's functionality.
Here is our setup:
* nn1: aNN
* nn2: sNN
* nn3: sNN
* nn4: oNN
Due to heavier read traffic, we decided to add another oNN (nn5) and set
dfs.client.failover.random.order=true for better read distribution. Otherwise,
all traffic is routed to the first oNN in the list.
With the above setup, the HDFS_DELEGATION_TOKEN issue worsened, and simple
MapReduce/hive jobs started to fail."
Error from oNN logs
2024-01-15 11:03:26,152 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth
failed for 10.xx.xx.xx:54014:null (DIGEST-MD5: IO error acquiring password)
with true cause: (token (token for end-user1: HDFS_DELEGATION_TOKEN
owner=end-user1, renewer=end-user1, realUser=, issueDate=1705338205996,
maxDate=1705943005996, sequenceNumber=277018178, masterKeyId=2195) can't be
found in cache)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]