[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578574#comment-15578574
 ] 

Allen Wittenauer commented on HDFS-10999:
-----------------------------------------

bq. That's what I was getting at with the pendingReconstructionBlocksCount. If 
we fix it as I talked about above, it'd actually tell you how much work is 
remaining, and how fast that work is progressing.

That might work, but I just had a thought.  Are we exposing how many blocks are 
EC blocks and how many blocks are normally replicated blocks?  (If not, I 
really hope the explanation is a good one...) It seems that we should have 
symmetry here.  If we have N types of blocks, I'm going to want to know NxM 
counts of information.  It's pretty much the only way that advanced users will 
know if certain types of blocks are actually working to their benefit.  Like 
compression, space savings isn't the only consideration.

bq. I really, really hope that manually copying blocks around is not a normal 
part of operating an HDFS cluster.
...
bq.  I recall seeing some customer issues where we temporarily bumped up these 
values to more quickly recover from failures.

You've sort of answered your own question. ;)

Most of the advanced admins I know do it several times a year, either because 
the NN was too stupid to fix it's own replication problems and/or because it 
was simply faster for us to do it rather than wait for the normal block 
replication process. 

For example, as an admin, I might know that there is no YARN running on a 
source node or the destination node, so it's totally OK to do a brute copy from 
one DN to another other without busting the network.  HDFS block deletes are 
significantly faster than replication, so just do the copy, run the balancer, 
and let the NN remove the duplicates at it's leisure.  All without fumbling 
with the continually ever growing and poorly documented HDFS settings.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-10999
>                 URL: https://issues.apache.org/jira/browse/HDFS-10999
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yuanbo Liu
>              Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to