[ https://issues.apache.org/jira/browse/HDFS-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083463#comment-15083463 ]
Daryn Sharp commented on HDFS-9574: ----------------------------------- Might consider checking if the bp is registered in {{checkAccess}} to avoid every caller explicitly checking the bp before calling {{checkAccess}}. Sleeping for 1s and and incrementing a counter until it reaches the number of configured seconds is fragile - assumes the sleep really was for 1s which may not be true if there was a long GC, etc. I'd suggest using a {{StopWatch}} for correctness. I think something similar needs to be done for the RPC service. Block tokens cannot be authenticated until after registration when it gets the block secret. The dfs client checks {{getReplicaVisibleLength}} for the last block if not complete and the rpc client doesn't appear to have any retry proxy. This is likely to affect users that frequently read while writing or appending to a file (ex. logging into hdfs, perhaps hbase?). Blocking in the RPC layer, unlike the data xceiver threads, is not desirable. Once the readers jam due to one unregistered bp, admin calls or calls for other block pools will be stalled too. Ideally the DN secret manager should throw a {{RetriableException}} if the bp has no secrets. The client can handle the retries. Appears it would be backwards compat. > Reduce client failures during datanode restart > ---------------------------------------------- > > Key: HDFS-9574 > URL: https://issues.apache.org/jira/browse/HDFS-9574 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Attachments: HDFS-9574.patch, HDFS-9574.v2.patch > > > Since DataXceiverServer is initialized before BP is fully up, client requests > will fail until the datanode registers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)