[ 
https://issues.apache.org/jira/browse/HDFS-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002729#comment-17002729
 ] 

Ayush Saxena commented on HDFS-15078:
-------------------------------------

{quote}And overwrite is true by default, this would make the file had been 
written an empty file. This is an critical problem and we had encountered it
{quote}
This wouldn't be solved with your fix too, If the client crashed post the 
check, this scenario will again come, This doesn't seems to be a problem with 
the client crashing and the Router sending the request still to Namenode, The 
issue is the first router which sent the request that late, That client did 
failover to another router, triggered a new call and the second router 
completed the call, and the first call came after this. 

The problem is RBF can't ensure perfect sequential behavior, since there are 
multiple routers, accepting calls, if any one router is slow and others are 
fast, this type of problem can come. If such a case where one Router is 
delaying, I think without client connection crashing still issues like these 
can come up.

> RBF: Should check connection channel before sending rpc to namenode
> -------------------------------------------------------------------
>
>                 Key: HDFS-15078
>                 URL: https://issues.apache.org/jira/browse/HDFS-15078
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-15078.001.patch, HDFS-15078.002.patch
>
>
> dfsrouter logs show that
> {quote}
> 2019-12-20 04:11:26,724 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 6400 on 8888, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 10.83.164.11:56908 Call#2 Retry#0: output error
> 2019-12-20 04:11:26,724 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 125 on 8888 caught an exception
> java.nio.channels.ClosedChannelException
>         at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2731)
>         at org.apache.hadoop.ipc.Server.access$2100(Server.java:134)
>         at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1089)
>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1161)
>         at 
> org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2109)
>         at 
> org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1229)
>         at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:631)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2245)
> {quote}
> Maybe checking connection between client and router is better before 
> sendingrpc to namenode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to