[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955938#comment-15955938 ] Wei-Chiu Chuang commented on HDFS-10999: [~manojg] thanks for working on this. The direction of the discussion has deviated from the original purpose, so please update the summary of the jira accordingly when you upload the patch. Thx! > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886693#comment-15886693 ] Manoj Govindassamy commented on HDFS-10999: --- [~tasanuma0829], thanks for sharing your thoughts on the proposal. Will proceed with this proposal unless I hear any alternative suggestions from others. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882803#comment-15882803 ] Takanobu Asanuma commented on HDFS-10999: - Thanks for the good summary, [~manojg]! I agree with you for the most part. I want to share my thoughts. 1. +1 for not changing {{fsck}}. 2, 3. I think changing {{dfsadmin -report}} and {{NN-WebUI}} are almost same work because they refers to the same metrics of {{FSNamesystemMBean}}. So the key point is how to extend {{FSNamesystemMBean}}. {quote} – For backward compatibility reasons, let the current FSNameSystem#getStats() be as is, and will continue to return cumulative stats for all Block combined. – Introduce FSNameSystem#getReplicatedBlockStats() and FSNameSystem#getECBlockStats() to capture Replicated and EC Blocks stats separately. {quote} I agree with that. And I think this is fit for my suggestion that is adding new two mbeans for replicated-blocks and ec-block-groups to {{FSNamesystem}}. *My proposal based on your proposal* : -- Since {{FSNameSystem#getStats}} refers to {{FSNameSystemMBean}}, let them be as they are. It would be good if we use the new generic terms here. -- Add new mbeans, {{ReplicatedBlockMBean}} and {{ECBlockGroupMBean}}, to {{FSNamesystem}}. -- {{FSNameSystem#getReplicatedBlockStats}} refers to {{ReplicatedBlockMBean}}. -- {{FSNameSystem#getECBlockGroupStats}} refers to {{ECBlockGroupMBean}}. Let's be careful with terminology to avoid confusions. Referring to fsck would be better. || replicated || erasure coded || | block(s) | block group(s) | | replica(s) | internal block(s) | So like this: {noformat} # hdfs dfsadmin -report Configured Capacity: 1498775814144 (1.36 TB) Present Capacity: 931852427264 (867.86 GB) DFS Remaining: 931805765632 (867.81 GB) DFS Used: 46661632 (44.50 MB) DFS Used%: 0.01% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Under ec block groups: 0 EC block groups with corrupt internal blocks: 0 Missing ec block groups: 0 Pending deletion ec block groups: 0 {noformat} > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881777#comment-15881777 ] Manoj Govindassamy commented on HDFS-10999: --- Based on the discussions and consensus above, my understanding is that we want to go about having tools/UI reporting Replicated and EC Blocks separately. 1. {{fsck}} command already reports Replicated blocks and EC blocks separately. Verified the reporting under EC blocks and they look good to me. Not planning to add more changes to {{fsck}} for now w.r.t this jira. {noformat} # hdfs fsck / Connecting to namenode via http://127.0.0.1:50002/fsck?ugi=manoj=%2F FSCK started by manoj (auth:SIMPLE) from /127.0.0.1 for path / at Thu Feb 23 15:21:06 PST 2017 Status: HEALTHY Number of data-nodes: 3 Number of racks: 1 Total dirs:5 Total symlinks:0 Replicated Blocks: Total size:1024 B Total files: 5 Total blocks (validated): 5 (avg. block size 2048000 B) Minimally replicated blocks: 5 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 3.0 Missing blocks:0 Corrupt blocks:0 Missing replicas: 0 (0.0 %) Erasure Coded Block Groups: Total size:1024 B Total files: 5 Total block groups (validated):5 (avg. block group size 2048000 B) Minimally erasure-coded block groups: 5 (100.0 %) Over-erasure-coded block groups: 0 (0.0 %) Under-erasure-coded block groups: 0 (0.0 %) Unsatisfactory placement block groups: 0 (0.0 %) Default ecPolicy: RS-DEFAULT-6-3-64k Average block group size: 3.0 Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 (0.0 %) FSCK ended at Thu Feb 23 15:21:06 PST 2017 in 15 milliseconds The filesystem under path '/' is HEALTHY {noformat} 2. {{dfsadmin -report}} command is not reporting EC blocks separately. Today, report command gets stats from {{FSNameSystem#getStats()}} which is the combined stats for both Replicated and EC Blocks. * {noformat} # hdfs dfsadmin -report Configured Capacity: 1498775814144 (1.36 TB) Present Capacity: 931852427264 (867.86 GB) DFS Remaining: 931805765632 (867.81 GB) DFS Used: 46661632 (44.50 MB) DFS Used%: 0.01% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 {noformat} *Proposal:* -- For backward compatibility reasons, let the current {{FSNameSystem#getStats()}} be as is, and will continue to return cumulative stats for all Block combined. -- Introduce {{FSNameSystem#getReplicatedBlockStats()}} and {{FSNameSystem#getECBlockStats()}} to capture Replicated and EC Blocks stats separately. -- In the report {{Under replicated blocks}}, {{Blocks with corrupt replicas}}, {{Missing blocks}} will only show stats for Replicated blocks (compared to the current cumulative numbers) -- New fields like {{Under erasure coded block groups}}, {{Corrupt erasure coded block groups}}, {{Missing erasure coded block groups}} will be added to the report command which contains stats for Erasure coded blocks only. * {noformat} # hdfs dfsadmin -report Configured Capacity: 1498775814144 (1.36 TB) Present Capacity: 931852427264 (867.86 GB) DFS Remaining: 931805765632 (867.81 GB) DFS Used: 46661632 (44.50 MB) DFS Used%: 0.01% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Under erasure coded blocks groups: 0 Erasure coded blocks with corrupt replicas: 0 Missing erasure coded blocks: 0 Pending deletion erasure coded blocks: 0 {noformat} 3. For the WebUI, in order to report Erasure Coded blocks details {{FSNameSysatemMBean}} need to be extended. -- Currently we have the following ones reported under Summary section in NameNode UI, but they will be including both Replicated + EC stats {noformat} Number of Under-Replicated Blocks Number of Blocks Pending Deletion {noformat} -- [~lewuathe] has already [proposed|https://issues.apache.org/jira/secure/attachment/12852567/Screen%20Shot%202017-02-14%20at%2022.43.57.png] a patch for adding Total EC blocks and its size under HDFS-8196. *Proposal:* -- Display the Replicated and EC block stats separately in the Summary section NameNode UI. No cumulative stats. {noformat} Number of Under-Replicated Blocks Number of Blocks Pending Deletion Number of Under-Erasure-Coded Blocks Groups Number of Erasure Coded Blocks Pending Deletion {noformat} [~andrew.wang], [~aw], [~tasanuma0829], [~jojochuang], [~yuanbo], Can you please
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877224#comment-15877224 ] Takanobu Asanuma commented on HDFS-10999: - Thanks for the comment, [~aw]. bq. In this way, it would be easy to keep code consistency with branch-2. I mean it is just a good side effect of my suggestion and not the main purpose. I don't think it is required. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876670#comment-15876670 ] Allen Wittenauer commented on HDFS-10999: - bq. In this way, it would be easy to keep code consistency with branch-2. It's been almost 1.5 years since branch-2.8.0 was cut. It's been almost 2 years since 2.7.0 was released. Why should the project make long term compromises for short term gain? (In fact, I've been thinking more and more about all of the changes to fsck that were protected with flags. We should probably make most of those flags nops in 3.x before beta, given the "continually not being released" state of branch-2.) > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875384#comment-15875384 ] Takanobu Asanuma commented on HDFS-10999: - Hi all, thanks for the discussion. I would like to propose my suggestion. I understand how important to divide the metrics into replicated-blocks and ec-block-groups. But I also think summarized(repl+ec) metrics are useful. At the moment, in {{FSNamesystem}}, {{FSNamesystemMBean}} has the summarized metrics. How about leaving {{FSNamesystemMBean}} as is and adding new two mbeans for replicated-blocks and ec-block-groups to {{FSNamesystem}}? In this way, it would be easy to keep code consistency with branch-2. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816787#comment-15816787 ] Yuanbo Liu commented on HDFS-10999: --- I think I've lost the context of this JIRA. Dismiss my ownership, feel free to assign it to anyone else. Sorry to interrupt! > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: hdfs-ec-3.0-nice-to-have, supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606505#comment-15606505 ] Andrew Wang commented on HDFS-10999: Good finds Wei-chiu. Could you file JIRAs to fix these per the discussion here? > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605080#comment-15605080 ] Wei-Chiu Chuang commented on HDFS-10999: Thanks for the input, Allen. I think it makes sense. As it turns out, we are exposing these metrics inconsistently across different tools. fsck is being implemented to distinguish replicated blocks and erasure coded blocks. But tools like "dfsadmin -report" aren't. Metrics {{BlockManager#getUnderReplicatedBlocksCount}} is generalized to combine both under replicated and under erasure coded blocks. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602650#comment-15602650 ] Allen Wittenauer commented on HDFS-10999: - I've been out of town and I've had more time to think about this issue. I'm pretty much convinced that tying what are effectively two metrics to a single value is a bad idea. I would really want to see the two values separated because it does directly impact how maintenance windows and recovery are performed. More information is significantly more valuable than less here. The same goes for other metrics such as rates: I really do want to know how long it is taking for full blocks to replicate vs. EC blocks to recovery. They have slightly different performance characteristics at the node level and advanced users are going to want to know what the perf impact on any runnings jobs might be. For example, if I know my nodes take x% of the CPU for EC recovery during a node migration, I'm going to want to set the CPU settings for the Docker cgroups that I'm using to protect my cluster from YARN's security issues differently during that migration to make sure I have enough juice vs. normal operation. In other words, this is not a good place to 'dumb down' the metrics. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590122#comment-15590122 ] Andrew Wang commented on HDFS-10999: [~jojochuang] thanks for sharing that output. Allen mentioned that fsck is used as both a quick check, as well as a rough measure of how much recovery work is ongoing. Assuming that "Missing internal blocks" goes up when "Under-erasure-coded groups" is non-zero, this seems workable. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588153#comment-15588153 ] Yuanbo Liu commented on HDFS-10999: --- I guess some monitor scripts are based on "fsck" command. Admins may write some similar code {code} fsck|grep "Under-replicated" {code} or key-value formatter in their scripts. Changing the old key name will force them changing their monitor scripts. This is my understanding about Allen's concern on incompatible issue. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585942#comment-15585942 ] Wei-Chiu Chuang commented on HDFS-10999: Just FYI, the fsck output in Hadoop 3 is of the following format. It does separate replicated blocks from erasure coded blocks. [~aw] how does this output look for you from an admin perspective? {noformat} FSCK started by weichiu (auth:SIMPLE) from /127.0.0.1 for path / at Tue Oct 18 09:37:00 PDT 2016 /striped/corrupted: CORRUPT blockpool BP-921842435-172.16.1.88-1476808612846 block blk_-9223372036854775792 /striped/corrupted: CORRUPT 1 blocks of total size 393216 B. Status: CORRUPT Number of data-nodes: 9 Number of racks: 1 Total dirs:2 Total symlinks:0 Replicated Blocks: Total size:0 B Total files: 0 Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks:0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor:3 Average block replication: 0.0 Missing blocks:0 Corrupt blocks:0 Missing replicas: 0 Erasure Coded Block Groups: Total size:393216 B Total files: 1 Total block groups (validated):1 (avg. block group size 393216 B) UNRECOVERABLE BLOCK GROUPS: 1 (100.0 %) CORRUPT FILES:1 CORRUPT BLOCK GROUPS: 1 CORRUPT SIZE: 393216 B Minimally erasure-coded block groups: 0 (0.0 %) Over-erasure-coded block groups: 0 (0.0 %) Under-erasure-coded block groups: 0 (0.0 %) Unsatisfactory placement block groups: 0 (0.0 %) Default ecPolicy: RS-DEFAULT-6-3-64k Average block group size: 5.0 Missing block groups: 0 Corrupt block groups: 1 Missing internal blocks: 0 (0.0 %) FSCK ended at Tue Oct 18 09:37:00 PDT 2016 in 2 milliseconds The filesystem under path '/' is CORRUPT {noformat} > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583691#comment-15583691 ] Andrew Wang commented on HDFS-10999: Turns out we already have JIRAs for some of these, I did file some: * HDFS-11023 I/O based throttling of DN replication work * HDFS-11024 Add rate metrics for block recovery work * HDFS-8672 Erasure Coding: Add EC-related Metrics to NN (seperate striped blocks count from UnderReplicatedBlocks count) * HDFS-9943 Support reconfiguring namenode replication confs > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583660#comment-15583660 ] Andrew Wang commented on HDFS-10999: bq. Are we exposing how many blocks are EC blocks and how many blocks are normally replicated blocks? I don't think so, and you're right that we should in some fashion. Sounds like we want these counts both for the whole filesystem as well as recovery-related metrics. As a first-cut, I think these counts can ignore the EC policy. I think most clusters will only use a single EC policy since it heavily depends on the # of racks. We can expand this to per-policy metrics if we find it necessary. bq. Most of the advanced admins I know do it several times a year, either because the NN was too stupid to fix it's own replication problems and/or because it was simply faster for us to do it rather than wait for the normal block replication process. I choose to interpret this as HDFS needing better knobs for emergency replication :) This has been great info, I'll file some JIRAs to track these work items. Sounds like: * I/O based pending replication metrics / throttles * EC block counts * dynamically configurable replication throttles > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578574#comment-15578574 ] Allen Wittenauer commented on HDFS-10999: - bq. That's what I was getting at with the pendingReconstructionBlocksCount. If we fix it as I talked about above, it'd actually tell you how much work is remaining, and how fast that work is progressing. That might work, but I just had a thought. Are we exposing how many blocks are EC blocks and how many blocks are normally replicated blocks? (If not, I really hope the explanation is a good one...) It seems that we should have symmetry here. If we have N types of blocks, I'm going to want to know NxM counts of information. It's pretty much the only way that advanced users will know if certain types of blocks are actually working to their benefit. Like compression, space savings isn't the only consideration. bq. I really, really hope that manually copying blocks around is not a normal part of operating an HDFS cluster. ... bq. I recall seeing some customer issues where we temporarily bumped up these values to more quickly recover from failures. You've sort of answered your own question. ;) Most of the advanced admins I know do it several times a year, either because the NN was too stupid to fix it's own replication problems and/or because it was simply faster for us to do it rather than wait for the normal block replication process. For example, as an admin, I might know that there is no YARN running on a source node or the destination node, so it's totally OK to do a brute copy from one DN to another other without busting the network. HDFS block deletes are significantly faster than replication, so just do the copy, run the balancer, and let the NN remove the duplicates at it's leisure. All without fumbling with the continually ever growing and poorly documented HDFS settings. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576938#comment-15576938 ] Andrew Wang commented on HDFS-10999: Thanks for the insight Allen, {quote} M: "How long for recovery?" A: "No idea. The NN doesn't tell me if these are EC blocks or regular blocks that were lost and one is faster to recover than the other." {quote} That's what I was getting at with the pendingReconstructionBlocksCount. If we fix it as I talked about above, it'd actually tell you how much work is remaining, and how fast that work is progressing. {quote} ...I've also used it during system recovery and migrations as a measurement of how many more DNs I need to bring up such that I have more sources for block replication. {quote} Would the "pending" queue metrics also work for this? We can also look at improved DN-side metrics related to replication work. {quote} This number represents something that I as an admin have some semblance of control over: I could always manually copy blocks from one node to another to speed things up. Under EC, I don't know of anything manual I can do if it is missing chunks of blocks. {quote} I really, really hope that manually copying blocks around is not a normal part of operating an HDFS cluster. Point is still valid though, maybe we should take a harder look at the recovery work throttles on the NN and DN, and make them dynamically reconfigurable if they aren't. I recall seeing some customer issues where we temporarily bumped up these values to more quickly recover from failures. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576844#comment-15576844 ] Allen Wittenauer commented on HDFS-10999: - bq. We tried to draw an equivalence between the durability of EC and replicated files by looking at the # of failures to data loss. This way we have a way of prioritizing both types of recovery work on the NN (see the LowRedundancyBlocks class, nee UnderReplicatedBlocks). Hmm. That's great for the NN, but it leaves me as an admin in the dark. A: "So we had some issues on HDFS." M: "What's the damage?" A: "We are missing x blocks." M: "How long for recovery?" A: "No idea. The NN doesn't tell me if these are EC blocks or regular blocks that were lost and one is faster to recover than the other." bq. In my experience, the "# under replicated blocks" is used as a quick check of cluster health. It's used for that, but I've also used it during system recovery and migrations as a measurement of how many more DNs I need to bring up such that I more sources for block replication. This number represents something that I as an admin have some semblance of control over: I could always manually copy blocks from one node to another to speed things up. Under EC, I don't of anything manual I can do if it is missing chunks of blocks. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572797#comment-15572797 ] Andrew Wang commented on HDFS-10999: We tried to draw an equivalence between the durability of EC and replicated files by looking at the # of failures to data loss. This way we have a way of prioritizing both types of recovery work on the NN (see the LowRedundancyBlocks class, nee UnderReplicatedBlocks). I think this is kind of okay from an admin POV. In my experience, the "# under replicated blocks" is used as a quick check of cluster health. If it's non-zero or not a small number, something is off and maybe you shouldn't rolling restart your cluster. Something we might want to take a harder look at is actually the pendingReconstructionBlocksCount. By looking at the rate of change, it tells you how long until your cluster is back up to full strength. However, since EC recovery is more expensive than replication, this metric is underspecified. The cost for recovery also depends on the EC policy for that block. We should also reexamine the block recovery throttles for the same reason. It's still looking at the # of blocks being recovered rather than the amount of I/O. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572514#comment-15572514 ] Zhe Zhang commented on HDFS-10999: -- Thanks for the thoughts [~aw]. EC policies are configured per file and per directory. So it's possible (and very likely in a 3.0 deployment) to have EC an non-EC blocks in the same cluster. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572492#comment-15572492 ] Allen Wittenauer commented on HDFS-10999: - Given that the fsck output is the *only* way to get some pieces of information, changing fsck is almost always a major, ops breaking event. Coupling that with pretty much breaking metrics collection... There is no "supposed" here: this is very very incompatible and will cause admins to burn Apache Hadoop conferences to the ground in their anger if we aren't careful. That said, I can empathize with the EC folks. 'under replicated' doesn't really cover the state of a block with a missing reconstructable chunk. But I'm not sure that 'low redundancy' necessarily conveys the state of a non-EC block either. If I'm not running EC at all, it comes across as a gratuitous change. I need to think more about this, to be honest. But some questions first: I'm trying to remember, is it possible to have EC and non-EC blocks in a file system? If not, what about in the future? Are we actually trying to shoe horn two separate measurements into the same metric here? Is there a situation where having both under replicated and low redundancy blocks makes sense? How does the storage policy interact with a change like this? > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569421#comment-15569421 ] Andrew Wang commented on HDFS-10999: Yea, thanks for the pointer [~rakesh_r]. One idea is that we could add new metrics with more accurate names and deprecate the old ones for Hadoop 3. Then Hadoop 4, consider removing them. For the webui, I think we can migrate it over directly, since it's not covered by compatibility. Web users are supposed to parse /jmx output instead, which is structured. It's a bit trickier for shell commands. Empirically, people do parse the output of commands like dfsadmin and fsck. Having double prints for both the old and new names is kind of ugly. [~aw] any thoughts on how best to migrate the names for the shell tools? > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568227#comment-15568227 ] Rakesh R commented on HDFS-10999: - Thanks [~jojochuang] for bringing this point. I remember there was a [discussion|https://issues.apache.org/jira/browse/HDFS-7955?focusedCommentId=15159765=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15159765] about renaming the replication related metrics. It was not done considering that the upper layer applications quite heavily used these metrics and could be risky. > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks
[ https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568050#comment-15568050 ] Wei-Chiu Chuang commented on HDFS-10999: One thing I want to call out is metrics names. There are metrics named "UnderReplicatedBlocks", "PendingReplicationBlocks" and etc. Changing them are supposedly incompatible, especially for monitoring systems (such as Cloudera Manager or Ambari). [~zhz] [~andrew.wang] appreciate if you can comment on this. Thanks! > Use more generic "low redundancy" blocks instead of "under replicated" blocks > - > > Key: HDFS-10999 > URL: https://issues.apache.org/jira/browse/HDFS-10999 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Yuanbo Liu > Labels: supportability > > Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic > term "low redundancy" to the old-fashioned "under replicated". But this term > is still being used in messages in several places, such as web ui, dfsadmin > and fsck. We should probably change them to avoid confusion. > File this jira to discuss it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org