[ https://issues.apache.org/jira/browse/HDFS-17769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guo Wei updated HDFS-17769: --------------------------- Description: When we use Router to forward read requests to the observer, if the cluster experiences heavy write workloads, Observer nodes may fail to keep pace with edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter is configured, it may still occur. This triggers RetriableException: Observer Node is too far behind errors. Especially when the client ipc.client.ping parameter is set to true, it will strive to wait and constantly retry, which can cause the business to be unable to obtain the desired data timely. We should consider having the active namenode handle this at this time. Here are our some errors and repair verification: The stateid of the observer is too far behind the active: {code:java} // code placeholder Tue Apr 15 11:22:41 CST 2025, Active latest txId: 5698245512, Observer latest txId:5695118653,Observer far behind: 3126859, time takes0s Tue Apr 15 11:22:43 CST 2025, Active latest txId: 5698253145, Observer latest txId:5695118653,Observer far behind: 3134492, time takes0s Tue Apr 15 11:22:45 CST 2025, Active latest txId: 5698260942, Observer latest txId:5695118653,Observer far behind: 3142289, time takes0s Tue Apr 15 11:22:47 CST 2025, Active latest txId: 5698268614, Observer latest txId:5695123653,Observer far behind: 3144961, time takes0s Tue Apr 15 11:22:49 CST 2025, Active latest txId: 5698276490, Observer latest txId:5695123653,Observer far behind: 3152837, time takes0s Tue Apr 15 11:22:51 CST 2025, Active latest txId: 5698284361, Observer latest txId:5695128653,Observer far behind: 3155708, time takes0s Tue Apr 15 11:22:54 CST 2025, Active latest txId: 5698292641, Observer latest txId:5695128653,Observer far behind: 3163988, time takes0s {code} RetriableException: The client will throw a RetriableException and cannot connect to the router for reading: {code:java} // code placeholder 10:16:53.744 [IPC Client (24555242) connection to routerIp:8888 from hdfs] DEBUG org.apache.hadoop.ipc.Client - IPC Client (24555242) connection to routerIp:8888 from hdfs: stopped, remaining connections 0 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): Observer Node is too far behind: serverStateId = 5695128653 clientStateId = 5698292641 at sun.reflect.GeneratedConstructorAccessor49.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:505) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:972) at org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:981) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:883) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1044) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1106) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3063) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): Observer Node is too far behind: serverStateId = 5632963133 clientStateId = 5635526176 at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at org.apache.hadoop.ipc.Client.call(Client.java:1513) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:966) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:637) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:467) ... 15 more at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584) at org.apache.hadoop.ipc.Client.call(Client.java:1529) at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$getFileInfo$41(ClientNamenodeProtocolTranslatorPB.java:820) at org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:820) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hdfs.server.namenode.ha.RouterObserverReadProxyProvider$RouterObserverReadInvocationHandler.invoke(RouterObserverReadProxyProvider.java:216) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1770) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1828) at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1825) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1840) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:611) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:468) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:432) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2592) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2558) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2520) at hadoop.write_then_observer_read2.main(write_then_observer_read2.java:64) {code} repair verification : {code:java} // code placeholder (1) View the status of the cluster NameNode:[root@20w ~]# hdfs haadmin -ns hh-rbf-test5 -getAllServiceState 20w:8020 active 21w:8020 standby 22w:8020 observer (2) We enable the dfs.namenode.observer.too.stale.retry.active.enable parameter and execute a read command on the 21w machine:[root@21w ~]# hdfs dfs -cat /t.sh /bin/ssh $1 (3) The read RPC request can be found in hdfs-audit.log in the active namennode, so the request is forwarded to the active namenode[root@20w ~]# tail -f /data/disk02/var/log/hadoop/hdfs/hdfs-audit.log|grep t.sh 2025-04-15 11:24:31,148 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/xx cmd=getfileinfo src=/t.sh dst=null perm=null proto=rpc 2025-04-15 11:24:31,461 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/xx cmd=open src=/t.sh dst=null perm=null proto=rpc (4) there are logs of retries to active in the observer log2025-04-15 11:24:30,148 WARN namenode.FSNamesystem (GlobalStateIdContext.java:receiveRequestState(163)) - Retrying to Active NameNode, Observer Node is too far behind: serverStateId = 5695393653 clientStateId = 5699337672 {code} was: When we use Router to forward read requests to the observer, if the cluster experiences heavy write workloads, Observer nodes may fail to keep pace with edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter is configured, it may still occur. This triggers RetriableException: Observer Node is too far behind errors. Especially when the client ipc.client.ping parameter is set to true, it will strive to wait and constantly retry, which can cause the business to be unable to obtain the desired data timely. We should consider having the active namenode handle this at this time. Here are our some errors and repair verification: The stateid of the observer is too far behind the active:1.png RetriableException:2.png repair verification : 3.png > Allows client to actively retry to Active NameNode when the Observer NameNode > is too far behind client state id. > ---------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17769 > URL: https://issues.apache.org/jira/browse/HDFS-17769 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.3.4, 3.3.6, 3.4.1 > Reporter: Guo Wei > Priority: Major > Labels: pull-request-available > Fix For: 3.4.2 > > Attachments: 1.png, 2.png, 3.png > > > When we use Router to forward read requests to the observer, if the cluster > experiences heavy write workloads, Observer nodes may fail to keep pace with > edit log synchronization, even if the dfs.ha.tail-edits.in-progress parameter > is configured, it may still occur. > This triggers RetriableException: Observer Node is too far behind errors. > Especially when the client ipc.client.ping parameter is set to true, it will > strive to wait and constantly retry, which can cause the business to be > unable to obtain the desired data timely. We should consider having the > active namenode handle this at this time. > Here are our some errors and repair verification: > The stateid of the observer is too far behind the active: > {code:java} > // code placeholder > Tue Apr 15 11:22:41 CST 2025, Active latest txId: 5698245512, Observer latest > txId:5695118653,Observer far behind: 3126859, time takes0s > Tue Apr 15 11:22:43 CST 2025, Active latest txId: 5698253145, Observer latest > txId:5695118653,Observer far behind: 3134492, time takes0s > Tue Apr 15 11:22:45 CST 2025, Active latest txId: 5698260942, Observer latest > txId:5695118653,Observer far behind: 3142289, time takes0s > Tue Apr 15 11:22:47 CST 2025, Active latest txId: 5698268614, Observer latest > txId:5695123653,Observer far behind: 3144961, time takes0s > Tue Apr 15 11:22:49 CST 2025, Active latest txId: 5698276490, Observer latest > txId:5695123653,Observer far behind: 3152837, time takes0s > Tue Apr 15 11:22:51 CST 2025, Active latest txId: 5698284361, Observer latest > txId:5695128653,Observer far behind: 3155708, time takes0s > Tue Apr 15 11:22:54 CST 2025, Active latest txId: 5698292641, Observer latest > txId:5695128653,Observer far behind: 3163988, time takes0s {code} > > RetriableException: > The client will throw a RetriableException and cannot connect to the router > for reading: > {code:java} > // code placeholder > 10:16:53.744 [IPC Client (24555242) connection to routerIp:8888 from hdfs] > DEBUG org.apache.hadoop.ipc.Client - IPC Client (24555242) connection to > routerIp:8888 from hdfs: stopped, remaining connections 0 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): > Observer Node is too far behind: serverStateId = 5695128653 clientStateId = > 5698292641 > at sun.reflect.GeneratedConstructorAccessor49.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:505) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:972) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getFileInfo(RouterClientProtocol.java:981) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:883) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1044) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1106) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3063) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): > Observer Node is too far behind: serverStateId = 5632963133 clientStateId = > 5635526176 > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) > at org.apache.hadoop.ipc.Client.call(Client.java:1513) > at org.apache.hadoop.ipc.Client.call(Client.java:1410) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) > > at com.sun.proxy.$Proxy19.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:966) > > at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:637) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:654) > > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:467) > > ... 15 more > > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1584) > at org.apache.hadoop.ipc.Client.call(Client.java:1529) > at org.apache.hadoop.ipc.Client.call(Client.java:1426) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) > > at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$getFileInfo$41(ClientNamenodeProtocolTranslatorPB.java:820) > > at > org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:820) > > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hdfs.server.namenode.ha.RouterObserverReadProxyProvider$RouterObserverReadInvocationHandler.invoke(RouterObserverReadProxyProvider.java:216) > > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366) > > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1770) > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1828) > > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1825) > > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1840) > > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:611) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:468) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:432) > at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2592) > at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2558) > at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2520) > at hadoop.write_then_observer_read2.main(write_then_observer_read2.java:64) > {code} > > repair verification : > {code:java} > // code placeholder > (1) View the status of the cluster NameNode:[root@20w ~]# hdfs haadmin -ns > hh-rbf-test5 -getAllServiceState > 20w:8020 active > 21w:8020 standby > 22w:8020 observer > (2) We enable the dfs.namenode.observer.too.stale.retry.active.enable > parameter and execute a read command on the 21w machine:[root@21w ~]# hdfs > dfs -cat /t.sh > /bin/ssh $1 > (3) The read RPC request can be found in hdfs-audit.log in the active > namennode, so the request is forwarded to the active namenode[root@20w ~]# > tail -f /data/disk02/var/log/hadoop/hdfs/hdfs-audit.log|grep t.sh > 2025-04-15 11:24:31,148 INFO FSNamesystem.audit: allowed=true ugi=root > (auth:SIMPLE) ip=/xx cmd=getfileinfo src=/t.sh dst=null > perm=null proto=rpc > 2025-04-15 11:24:31,461 INFO FSNamesystem.audit: allowed=true ugi=root > (auth:SIMPLE) ip=/xx cmd=open src=/t.sh dst=null > perm=null proto=rpc > (4) there are logs of retries to active in the observer log2025-04-15 > 11:24:30,148 WARN namenode.FSNamesystem > (GlobalStateIdContext.java:receiveRequestState(163)) - Retrying to Active > NameNode, Observer Node is too far behind: serverStateId = 5695393653 > clientStateId = 5699337672 {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org