[ 
https://issues.apache.org/jira/browse/HDFS-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003288#comment-17003288
 ] 

Ayush Saxena commented on HDFS-15079:
-------------------------------------

Thanx [~ferhui] for the UT.
The namenode Logic that you tend to add, that kind of logic is there in 
Namenode in form of RetryCache, It checks whether the call isn't a repeated one 
due to failover, if so, it doesn't execute it again rather sends the old 
response from the cache. You can check {{ClientProtocol.java}} there is an 
annotation above Create method, you can read its description.
That RetryCache logic doesn't seems to be working here, I guess since for it 
the clientId and callId should be same, but here, since the call is going from 
different router, not from the same client. So the NN doesn't consider it as a 
repeated call.
Will try to dig in more, but on a quick look, I feel this is the problem. I 
guess we have a JIRA too as Pass Client CallerContext to NN. May be this issue 
is also because of that. Not Sure.
[~elgoiri] [~hexiaoqiao] Can you also give a check once?


> RBF: Client maybe get an unexpected result with network anomaly 
> ----------------------------------------------------------------
>
>                 Key: HDFS-15079
>                 URL: https://issues.apache.org/jira/browse/HDFS-15079
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Fei Hui
>            Priority: Critical
>         Attachments: UnexpectedOverWriteUT.patch
>
>
>  I find there is a critical problem on RBF, HDFS-15078 can resolve it on some 
> Scenarios, but i have no idea about the overall resolution.
> The problem is that
> Client with RBF(r0, r1) create a file HDFS file via r0, it gets Exception and 
> failovers to r1
> r0 has been send create rpc to namenode(1st create)
> Client create a HDFS file via r1(2nd create)
> Client writes the HDFS file and close it finally(3rd close)
> Maybe namenode receiving the rpc in order as follow
> 2nd create
> 3rd close
> 1st create
> And overwrite is true by default, this would make the file had been written 
> an empty file. This is an critical problem 
> We had encountered this problem. There are many hive and spark jobs running 
> on our cluster,   sometimes it occurs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to