[ 
https://issues.apache.org/jira/browse/HDFS-15078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002739#comment-17002739
 ] 

Fei Hui edited comment on HDFS-15078 at 12/24/19 10:02 AM:
-----------------------------------------------------------

{quote}
The issue is the first router which sent the request that late, That client did 
failover to another router, triggered a new call and the second router 
completed the call, and the first call came after this. 
{quote}
Getting EOFException makes client failover to another router. 
And later the second router completed the call,  the first router sent the 
request late. If just the first router sent the request late, client doesn't 
get exception, it will not failover

{quote}
If such a case where one Router is delaying, I think without client connection 
crashing still issues like these can come up.
{quote}
Yes. This issue only can resolve the problem on some scenarios. HDFS-15079 
tracks the high level problem.

In our  scenarios. This fix works.



was (Author: ferhui):
{quote}
The issue is the first router which c, That client did failover to another 
router, triggered a new call and the second router completed the call, and the 
first call came after this. 
{quote}
Getting EOFException makes client failover to another router. 
And later and the second router completed the call,  the first router the first 
router.

{quote}
If such a case where one Router is delaying, I think without client connection 
crashing still issues like these can come up.
{quote}
Yes. This issue only can resolve the problem on some scenarios. HDFS-15079 
tracks the high level problem.

In our  scenarios. This fix works.


> RBF: Should check connection channel before sending rpc to namenode
> -------------------------------------------------------------------
>
>                 Key: HDFS-15078
>                 URL: https://issues.apache.org/jira/browse/HDFS-15078
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Major
>         Attachments: HDFS-15078.001.patch, HDFS-15078.002.patch
>
>
> dfsrouter logs show that
> {quote}
> 2019-12-20 04:11:26,724 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
> 6400 on 8888, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 
> 10.83.164.11:56908 Call#2 Retry#0: output error
> 2019-12-20 04:11:26,724 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 125 on 8888 caught an exception
> java.nio.channels.ClosedChannelException
>         at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         at org.apache.hadoop.ipc.Server.channelWrite(Server.java:2731)
>         at org.apache.hadoop.ipc.Server.access$2100(Server.java:134)
>         at 
> org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1089)
>         at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1161)
>         at 
> org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2109)
>         at 
> org.apache.hadoop.ipc.Server$Connection.access$400(Server.java:1229)
>         at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:631)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2245)
> {quote}
> Maybe checking connection between client and router is better before 
> sendingrpc to namenode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to