[ 
https://issues.apache.org/jira/browse/HDFS-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181007#comment-14181007
 ] 

Aaron T. Myers edited comment on HDFS-7278 at 10/23/14 4:27 AM:
----------------------------------------------------------------

bq. Very interesting. I have not encountered such an issue. If you have details 
it would be good to share.

I don't really have any firm details, but I do have a suspicion that we may 
have a bug which results in a block being considered under-replicated (possibly 
even entirely missing, if all replicas were affected) after a failover, when in 
fact all of the replicas of the block are just fine on the DNs in the cluster. 
I will of course share all the details when I figure them out. :)

The latest patch looks pretty good to me. Just a few small comments:

# Seems like we should restrict this command to require super user privileges. 
As it stands I believe any user could connect to the DN to trigger a full BR, 
which though not super harmful doesn't seem right, either.
# I think there may be a small race condition in the test case. Since you 
create a file and then immediately create a spy object to examine calls between 
the DN and NN, and then assert that no calls of blockReceivedAndDeleted were 
made, I think it's possible that the DN RPC to send an immediate incremental BR 
for that file creation might be delayed until after you've created the spy, 
which would cause the test to unnecessarily fail. I think more reliable would 
be to create the spy object before creating the file, and then assert that 
exactly one IBR was sent.
# I suspect that the most common use of this command will be to trigger full 
block reports, not incremental block reports, given that those are sent rather 
frequently in a busy cluster anyway. Perhaps we should change the default 
behavior of the command to send a full BR, and change the optional flag to be 
"-incremental" instead?

+1 once these are addressed. Thanks, Colin.


was (Author: atm):
bq. Very interesting. I have not encountered such an issue. If you have details 
it would be good to share.

I don't really have any firm details, but I do have a suspicion that we may 
have a bug which results in a block being considered under-replicated (possibly 
even entirely missing, if all replicas were affected) after a failover, when in 
fact all of the replicas of the block are just fine on the DNs in the cluster. 
In the case I will of course share all the details when I figure them out. :)

The latest patch looks pretty good to me. Just a few small comments:

# Seems like we should restrict this command to require super user privileges. 
As it stands I believe any user could connect to the DN to trigger a full BR, 
which though not super harmful doesn't seem right, either.
# I think there may be a small race condition in the test case. Since you 
create a file and then immediately create a spy object to examine calls between 
the DN and NN, and then assert that no calls of blockReceivedAndDeleted were 
made, I think it's possible that the DN RPC to send an immediate incremental BR 
for that file creation might be delayed until after you've created the spy, 
which would cause the test to unnecessarily fail. I think more reliable would 
be to create the spy object before creating the file, and then assert that 
exactly one IBR was sent.
# I suspect that the most common use of this command will be to trigger full 
block reports, not incremental block reports, given that those are sent rather 
frequently in a busy cluster anyway. Perhaps we should change the default 
behavior of the command to send a full BR, and change the optional flag to be 
"-incremental" instead?

+1 once these are addressed. Thanks, Colin.

> Add a command that allows sysadmins to manually trigger full block reports 
> from a DN
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-7278
>                 URL: https://issues.apache.org/jira/browse/HDFS-7278
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7278.002.patch
>
>
> We should add a command that allows sysadmins to manually trigger full block 
> reports from a DN.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to