Rohan Nimmagadda created HDFS-17340:
---------------------------------------

             Summary:  transaction lag issue between aNN and oNN causing 
HDFS_DELEGATION_TOKEN can't be found in cache in oNN
                 Key: HDFS-17340
                 URL: https://issues.apache.org/jira/browse/HDFS-17340
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: dfsclient, hdfs, namenode
    Affects Versions: 3.3.3
            Reporter: Rohan Nimmagadda


We experienced a transaction lag issue between aNN and oNN, causing problems in 
busier clusters. When HDFS_DELEGATION_TOKEN is created by aNN, the oNN couldn't 
catch up cache location immediately, leading to the issue of the token not 
being found in the cache in oNN.

We followed the document 
[[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ObserverNameNode.html]]
 to enable oNN's functionality.

Here is our setup:
 * nn1: aNN
 * nn2: sNN
 * nn3: sNN
 * nn4: oNN

Due to heavier read traffic, we decided to add another oNN (nn5) and set 
dfs.client.failover.random.order=true for better read distribution. Otherwise, 
all traffic is routed to the first oNN in the list.

With the above setup, the HDFS_DELEGATION_TOKEN issue worsened, and simple 
MapReduce/hive jobs started to fail."

Error from oNN logs
2024-01-15 11:03:26,152 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
failed for 10.xx.xx.xx:54014:null (DIGEST-MD5: IO error acquiring password) 
with true cause: (token (token for end-user1: HDFS_DELEGATION_TOKEN 
owner=end-user1, renewer=end-user1, realUser=, issueDate=1705338205996, 
maxDate=1705943005996, sequenceNumber=277018178, masterKeyId=2195) can't be 
found in cache)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to