[jira] [Commented] (HDFS-9574) Reduce client failures during datanode restart

Daryn Sharp (JIRA) Tue, 05 Jan 2016 09:56:22 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083463#comment-15083463
 ]


Daryn Sharp commented on HDFS-9574:
-----------------------------------

Might consider checking if the bp is registered in {{checkAccess}} to avoid 
every caller explicitly checking the bp before calling {{checkAccess}}.

Sleeping for 1s and and incrementing a counter until it reaches the number of 
configured seconds is fragile - assumes the sleep really was for 1s which may 
not be true if there was a long GC, etc.  I'd suggest using a {{StopWatch}} for 
correctness.

I think something similar needs to be done for the RPC service.  Block tokens 
cannot be authenticated until after registration when it gets the block secret. 
 The dfs client checks {{getReplicaVisibleLength}} for the last block if not 
complete and the rpc client doesn't appear to have any retry proxy.  This is 
likely to affect users that frequently read while writing or appending to a 
file (ex. logging into hdfs, perhaps hbase?).

Blocking in the RPC layer, unlike the data xceiver threads, is not desirable.  
Once the readers jam due to one unregistered bp, admin calls or calls for other 
block pools will be stalled too.  Ideally the DN secret manager should throw a 
{{RetriableException}} if the bp has no secrets.  The client can handle the 
retries.  Appears it would be backwards compat.

> Reduce client failures during datanode restart
> ----------------------------------------------
>
>                 Key: HDFS-9574
>                 URL: https://issues.apache.org/jira/browse/HDFS-9574
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-9574.patch, HDFS-9574.v2.patch
>
>
> Since DataXceiverServer is initialized before BP is fully up, client requests 
> will fail until the datanode registers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9574) Reduce client failures during datanode restart

Reply via email to