[ https://issues.apache.org/jira/browse/HDFS-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002729#comment-17002729 ]
Ayush Saxena commented on HDFS-15078: ------------------------------------- {quote}And overwrite is true by default, this would make the file had been written an empty file. This is an critical problem and we had encountered it {quote} This wouldn't be solved with your fix too, If the client crashed post the check, this scenario will again come, This doesn't seems to be a problem with the client crashing and the Router sending the request still to Namenode, The issue is the first router which sent the request that late, That client did failover to another router, triggered a new call and the second router completed the call, and the first call came after this. The problem is RBF can't ensure perfect sequential behavior, since there are multiple routers, accepting calls, if any one router is slow and others are fast, this type of problem can come. If such a case where one Router is delaying, I think without client connection crashing still issues like these can come up. > RBF: Should check connection channel before sending rpc to namenode > ------------------------------------------------------------------- > > Key: HDFS-15078 > URL: https://issues.apache.org/jira/browse/HDFS-15078 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf > Affects Versions: 3.3.0 > Reporter: Fei Hui > Assignee: Fei Hui > Priority: Major > Attachments: HDFS-15078.001.patch, HDFS-15078.002.patch > > > dfsrouter logs show that > {quote} > 2019-12-20 04:11:26,724 WARN org.apache.hadoop.ipc.Server: IPC Server handler > 6400 on 8888, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from > 10.83.164.11:56908 Call#2 Retry#0: output error > 2019-12-20 04:11:26,724 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 125 on 8888 caught an exception > java.nio.channels.ClosedChannelException > at > sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461) > at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2731) > at org.apache.hadoop.ipc.Server.access$2100(Server.java:134) > at > org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1089) > at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1161) > at > org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2109) > at > org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1229) > at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:631) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2245) > {quote} > Maybe checking connection between client and router is better before > sendingrpc to namenode -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org