[ 
https://issues.apache.org/jira/browse/HDFS-350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-350.
--------------------------------
    Resolution: Not A Problem

I'm resolving this issue.  In current versions, the client is more robust to 
this kind of failure.  The RPC layer implements retry policies.  Retried 
operations are handled gracefully using either an inherently idempotent 
implementation of the RPC or the retry cache for at-most-once execution.  In 
the event of an extremely long GC, the client would either retry and succeed 
after completion of the GC, or in more extreme cases it would trigger an HA 
failover and the client would successfully issue its call to the the new active 
NameNode.

> DFSClient more robust if the namenode is busy doing GC
> ------------------------------------------------------
>
>                 Key: HDFS-350
>                 URL: https://issues.apache.org/jira/browse/HDFS-350
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> In the current code, if the client (writer) encounters an RPC error while 
> fetching a new block id from the namenode, it does not retry. It throws an 
> exception to the application. This becomes especially bad if the namenode is 
> in the middle of a GC and does not respond in time. The reason the client 
> throws an exception is because it does not know whether the namenode 
> successfully allocated a block for this file.
> One possible enhancement would be to make the client retry the addBlock RPC 
> if needed. The client can send the block list that it currently has. The 
> namenode can match the block list send by the client with what it has in its 
> own metadata and then send back a new blockid (or a previously allocated 
> blockid that the client had not yet received because the earlier RPC 
> timedout). This will make the client more robust!
> This works even when we support Appends because the namenode will *always* 
> verify that the client has the lease for the file in question.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to