[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-04-04 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955938#comment-15955938
 ] 

Wei-Chiu Chuang commented on HDFS-10999:


[~manojg] thanks for working on this. The direction of the discussion has 
deviated from the original purpose, so please update the summary of the jira 
accordingly when you upload the patch.

Thx!

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-27 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886693#comment-15886693
 ] 

Manoj Govindassamy commented on HDFS-10999:
---

[~tasanuma0829], thanks for sharing your thoughts on the proposal. Will proceed 
with this proposal unless I hear any alternative suggestions from others. 

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-24 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882803#comment-15882803
 ] 

Takanobu Asanuma commented on HDFS-10999:
-

Thanks for the good summary, [~manojg]! I agree with you for the most part. I 
want to share my thoughts.

1. +1 for not changing {{fsck}}.



2, 3. I think changing {{dfsadmin -report}} and {{NN-WebUI}} are almost same 
work because they refers to the same metrics of {{FSNamesystemMBean}}. So the 
key point is how to extend {{FSNamesystemMBean}}.

{quote}
– For backward compatibility reasons, let the current FSNameSystem#getStats() 
be as is, and will continue to return cumulative stats for all Block combined.
– Introduce FSNameSystem#getReplicatedBlockStats() and 
FSNameSystem#getECBlockStats() to capture Replicated and EC Blocks stats 
separately.
{quote}

I agree with that. And I think this is fit for my suggestion that is adding new 
two mbeans for replicated-blocks and ec-block-groups to {{FSNamesystem}}.

*My proposal based on your proposal* :
-- Since {{FSNameSystem#getStats}} refers to {{FSNameSystemMBean}}, let them be 
as they are. It would be good if we use the new generic terms here.
-- Add new mbeans, {{ReplicatedBlockMBean}} and {{ECBlockGroupMBean}}, to 
{{FSNamesystem}}.
-- {{FSNameSystem#getReplicatedBlockStats}} refers to {{ReplicatedBlockMBean}}.
-- {{FSNameSystem#getECBlockGroupStats}} refers to {{ECBlockGroupMBean}}.



Let's be careful with terminology to avoid confusions. Referring to fsck would 
be better.

|| replicated || erasure coded ||
| block(s) | block group(s) |
| replica(s) | internal block(s) |

So like this:
{noformat}
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Replicated Blocks:
  Under replicated blocks: 0
  Blocks with corrupt replicas: 0
  Missing blocks: 0
  Missing blocks (with replication factor 1): 0
  Pending deletion blocks: 0
Erasure Coded Block Groups:
  Under ec block groups: 0
  EC block groups with corrupt internal blocks: 0
  Missing ec block groups: 0
  Pending deletion ec block groups: 0
{noformat}

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-23 Thread Manoj Govindassamy (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881777#comment-15881777
 ] 

Manoj Govindassamy commented on HDFS-10999:
---

Based on the discussions and consensus above, my understanding is that we want 
to go about having tools/UI reporting Replicated and EC Blocks separately. 

1. {{fsck}} command already reports Replicated blocks and EC blocks separately. 
Verified the reporting under EC blocks and they look good to me. Not planning 
to add more changes to {{fsck}} for now w.r.t this jira.
{noformat}
# hdfs fsck /
Connecting to namenode via http://127.0.0.1:50002/fsck?ugi=manoj=%2F
FSCK started by manoj (auth:SIMPLE) from /127.0.0.1 for path / at Thu Feb 23 
15:21:06 PST 2017

Status: HEALTHY
 Number of data-nodes:  3
 Number of racks:   1
 Total dirs:5
 Total symlinks:0

Replicated Blocks:
 Total size:1024 B
 Total files:   5
 Total blocks (validated):  5 (avg. block size 2048000 B)
 Minimally replicated blocks:   5 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 3.0
 Missing blocks:0
 Corrupt blocks:0
 Missing replicas:  0 (0.0 %)

Erasure Coded Block Groups:
 Total size:1024 B
 Total files:   5
 Total block groups (validated):5 (avg. block group size 2048000 B)
 Minimally erasure-coded block groups:  5 (100.0 %)
 Over-erasure-coded block groups:   0 (0.0 %)
 Under-erasure-coded block groups:  0 (0.0 %)
 Unsatisfactory placement block groups: 0 (0.0 %)
 Default ecPolicy:  RS-DEFAULT-6-3-64k
 Average block group size:  3.0
 Missing block groups:  0
 Corrupt block groups:  0
 Missing internal blocks:   0 (0.0 %)
FSCK ended at Thu Feb 23 15:21:06 PST 2017 in 15 milliseconds

The filesystem under path '/' is HEALTHY
{noformat}





2. {{dfsadmin -report}} command is not reporting EC blocks separately.  Today, 
report command gets stats from {{FSNameSystem#getStats()}} which is the 
combined stats for both Replicated and EC Blocks. 
*  {noformat}
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
{noformat}

*Proposal:* 
-- For backward compatibility reasons, let the current 
{{FSNameSystem#getStats()}} be as is, and will continue to return cumulative 
stats for all Block combined.
-- Introduce {{FSNameSystem#getReplicatedBlockStats()}} and 
{{FSNameSystem#getECBlockStats()}} to capture Replicated and EC Blocks stats 
separately.
-- In the report {{Under replicated blocks}}, {{Blocks with corrupt replicas}}, 
{{Missing blocks}} will only show stats for Replicated blocks (compared to the 
current cumulative numbers)
-- New fields like {{Under erasure coded block groups}}, {{Corrupt erasure 
coded block groups}}, {{Missing erasure coded block groups}} will be added to 
the report command which contains stats for Erasure coded blocks only.
*  {noformat}
# hdfs dfsadmin -report
Configured Capacity: 1498775814144 (1.36 TB)
Present Capacity: 931852427264 (867.86 GB)
DFS Remaining: 931805765632 (867.81 GB)
DFS Used: 46661632 (44.50 MB)
DFS Used%: 0.01%
Replicated Blocks:
  Under replicated blocks: 0
  Blocks with corrupt replicas: 0
  Missing blocks: 0
  Missing blocks (with replication factor 1): 0
  Pending deletion blocks: 0
Erasure Coded Block Groups:
  Under erasure coded blocks groups: 0
  Erasure coded blocks with corrupt replicas: 0
  Missing erasure coded blocks: 0
  Pending deletion erasure coded blocks: 0
{noformat}




3. For the WebUI, in order to report Erasure Coded blocks details 
{{FSNameSysatemMBean}} need to be extended.

-- Currently we have the following ones reported under Summary section in 
NameNode UI, but they will be including both Replicated + EC stats  {noformat}
Number of Under-Replicated Blocks   
Number of Blocks Pending Deletion   
{noformat}
-- [~lewuathe] has already 
[proposed|https://issues.apache.org/jira/secure/attachment/12852567/Screen%20Shot%202017-02-14%20at%2022.43.57.png]
 a patch for adding Total EC blocks and its size under HDFS-8196. 

*Proposal:* 
-- Display the Replicated and EC block stats separately in the Summary section 
NameNode UI. No cumulative stats. {noformat}
Number of Under-Replicated Blocks 
Number of Blocks Pending Deletion 
Number of Under-Erasure-Coded Blocks Groups
Number of Erasure Coded Blocks Pending Deletion 
{noformat}
 

[~andrew.wang], [~aw], [~tasanuma0829], [~jojochuang], [~yuanbo], Can you 
please 

[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-21 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877224#comment-15877224
 ] 

Takanobu Asanuma commented on HDFS-10999:
-

Thanks for the comment, [~aw].

bq. In this way, it would be easy to keep code consistency with branch-2.

I mean it is just a good side effect of my suggestion and not the main purpose. 
I don't think it is required.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876670#comment-15876670
 ] 

Allen Wittenauer commented on HDFS-10999:
-

bq. In this way, it would be easy to keep code consistency with branch-2.

It's been almost 1.5 years since branch-2.8.0 was cut.  It's been almost 2 
years since 2.7.0 was released.  Why should the project make long term 
compromises for short term gain?  (In fact, I've been thinking more and more 
about all of the changes to fsck that were protected with flags. We should 
probably make most of those flags nops in 3.x before beta, given the 
"continually not being released" state of branch-2.)

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-02-20 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875384#comment-15875384
 ] 

Takanobu Asanuma commented on HDFS-10999:
-

Hi all, thanks for the discussion. I would like to propose my suggestion.

I understand how important to divide the metrics into replicated-blocks and 
ec-block-groups. But I also think summarized(repl+ec) metrics are useful. At 
the moment, in {{FSNamesystem}}, {{FSNamesystemMBean}} has the summarized 
metrics. How about leaving {{FSNamesystemMBean}} as is and adding new two 
mbeans for replicated-blocks and ec-block-groups to {{FSNamesystem}}? In this 
way, it would be easy to keep code consistency with branch-2.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2017-01-10 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816787#comment-15816787
 ] 

Yuanbo Liu commented on HDFS-10999:
---

I think I've lost the context of this JIRA. Dismiss my ownership, feel free to 
assign it to anyone else. Sorry to interrupt!

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: hdfs-ec-3.0-nice-to-have, supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-25 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606505#comment-15606505
 ] 

Andrew Wang commented on HDFS-10999:


Good finds Wei-chiu. Could you file JIRAs to fix these per the discussion here?

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-25 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605080#comment-15605080
 ] 

Wei-Chiu Chuang commented on HDFS-10999:


Thanks for the input, Allen. I think it makes sense.

As it turns out, we are exposing these metrics inconsistently across different 
tools. fsck is being implemented to distinguish replicated blocks and erasure 
coded blocks. But tools like "dfsadmin -report" aren't. Metrics 
{{BlockManager#getUnderReplicatedBlocksCount}} is generalized to combine both 
under replicated and under erasure coded blocks.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602650#comment-15602650
 ] 

Allen Wittenauer commented on HDFS-10999:
-

I've been out of town and I've had more time to think about this issue.

I'm pretty much convinced that tying what are effectively two metrics to a 
single value is a bad idea.  I would really want to see the two values 
separated because it does directly impact how maintenance windows and recovery 
are performed.  More information is significantly more valuable than less here. 
 The same goes for other metrics such as rates:  I really do want to know how 
long it is taking for full blocks to replicate vs. EC blocks to recovery.  They 
have slightly different performance characteristics at the node level and 
advanced users are going to want to know what the perf impact on any runnings 
jobs might be.

For example, if I know my nodes take x% of the CPU for EC recovery during a 
node migration, I'm going to want to set the CPU settings for the Docker 
cgroups that I'm using to protect my cluster from YARN's security issues 
differently during that migration to make sure I have enough juice vs. normal 
operation.

In other words, this is not a good place to 'dumb down' the metrics. 

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-19 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590122#comment-15590122
 ] 

Andrew Wang commented on HDFS-10999:


[~jojochuang] thanks for sharing that output. Allen mentioned that fsck is used 
as both a quick check, as well as a rough measure of how much recovery work is 
ongoing. Assuming that "Missing internal blocks" goes up when 
"Under-erasure-coded groups" is non-zero, this seems workable.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-19 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588153#comment-15588153
 ] 

Yuanbo Liu commented on HDFS-10999:
---

I guess some monitor scripts are based on "fsck" command. Admins may write some 
similar code
{code}
fsck|grep "Under-replicated"
{code}
or key-value formatter in their scripts. Changing the old key name will force 
them changing their monitor scripts. This is my understanding about Allen's 
concern on incompatible issue.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-18 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585942#comment-15585942
 ] 

Wei-Chiu Chuang commented on HDFS-10999:


Just FYI, the fsck output in Hadoop 3 is of the following format. It does 
separate replicated blocks from erasure coded blocks. [~aw] how does this 
output look for you from an admin perspective?

{noformat}
FSCK started by weichiu (auth:SIMPLE) from /127.0.0.1 for path / at Tue Oct 18 
09:37:00 PDT 2016

/striped/corrupted: CORRUPT blockpool BP-921842435-172.16.1.88-1476808612846 
block blk_-9223372036854775792

/striped/corrupted: CORRUPT 1 blocks of total size 393216 B.
Status: CORRUPT
 Number of data-nodes:  9
 Number of racks:   1
 Total dirs:2
 Total symlinks:0

Replicated Blocks:
 Total size:0 B
 Total files:   0
 Total blocks (validated):  0
 Minimally replicated blocks:   0
 Over-replicated blocks:0
 Under-replicated blocks:   0
 Mis-replicated blocks: 0
 Default replication factor:3
 Average block replication: 0.0
 Missing blocks:0
 Corrupt blocks:0
 Missing replicas:  0

Erasure Coded Block Groups:
 Total size:393216 B
 Total files:   1
 Total block groups (validated):1 (avg. block group size 393216 B)
  
  UNRECOVERABLE BLOCK GROUPS:   1 (100.0 %)
  CORRUPT FILES:1
  CORRUPT BLOCK GROUPS: 1
  CORRUPT SIZE: 393216 B
  
 Minimally erasure-coded block groups:  0 (0.0 %)
 Over-erasure-coded block groups:   0 (0.0 %)
 Under-erasure-coded block groups:  0 (0.0 %)
 Unsatisfactory placement block groups: 0 (0.0 %)
 Default ecPolicy:  RS-DEFAULT-6-3-64k
 Average block group size:  5.0
 Missing block groups:  0
 Corrupt block groups:  1
 Missing internal blocks:   0 (0.0 %)
FSCK ended at Tue Oct 18 09:37:00 PDT 2016 in 2 milliseconds


The filesystem under path '/' is CORRUPT

{noformat}

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583691#comment-15583691
 ] 

Andrew Wang commented on HDFS-10999:


Turns out we already have JIRAs for some of these, I did file some:

* HDFS-11023 I/O based throttling of DN replication work
* HDFS-11024 Add rate metrics for block recovery work
* HDFS-8672 Erasure Coding: Add EC-related Metrics to NN (seperate striped 
blocks count from UnderReplicatedBlocks count)
* HDFS-9943 Support reconfiguring namenode replication confs


> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15583660#comment-15583660
 ] 

Andrew Wang commented on HDFS-10999:


bq. Are we exposing how many blocks are EC blocks and how many blocks are 
normally replicated blocks?

I don't think so, and you're right that we should in some fashion. Sounds like 
we want these counts both for the whole filesystem as well as recovery-related 
metrics.

As a first-cut, I think these counts can ignore the EC policy. I think most 
clusters will only use a single EC policy since it heavily depends on the # of 
racks. We can expand this to per-policy metrics if we find it necessary.

bq. Most of the advanced admins I know do it several times a year, either 
because the NN was too stupid to fix it's own replication problems and/or 
because it was simply faster for us to do it rather than wait for the normal 
block replication process.

I choose to interpret this as HDFS needing better knobs for emergency 
replication :)

This has been great info, I'll file some JIRAs to track these work items. 
Sounds like:

* I/O based pending replication metrics / throttles
* EC block counts
* dynamically configurable replication throttles

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-15 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15578574#comment-15578574
 ] 

Allen Wittenauer commented on HDFS-10999:
-

bq. That's what I was getting at with the pendingReconstructionBlocksCount. If 
we fix it as I talked about above, it'd actually tell you how much work is 
remaining, and how fast that work is progressing.

That might work, but I just had a thought.  Are we exposing how many blocks are 
EC blocks and how many blocks are normally replicated blocks?  (If not, I 
really hope the explanation is a good one...) It seems that we should have 
symmetry here.  If we have N types of blocks, I'm going to want to know NxM 
counts of information.  It's pretty much the only way that advanced users will 
know if certain types of blocks are actually working to their benefit.  Like 
compression, space savings isn't the only consideration.

bq. I really, really hope that manually copying blocks around is not a normal 
part of operating an HDFS cluster.
...
bq.  I recall seeing some customer issues where we temporarily bumped up these 
values to more quickly recover from failures.

You've sort of answered your own question. ;)

Most of the advanced admins I know do it several times a year, either because 
the NN was too stupid to fix it's own replication problems and/or because it 
was simply faster for us to do it rather than wait for the normal block 
replication process. 

For example, as an admin, I might know that there is no YARN running on a 
source node or the destination node, so it's totally OK to do a brute copy from 
one DN to another other without busting the network.  HDFS block deletes are 
significantly faster than replication, so just do the copy, run the balancer, 
and let the NN remove the duplicates at it's leisure.  All without fumbling 
with the continually ever growing and poorly documented HDFS settings.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576938#comment-15576938
 ] 

Andrew Wang commented on HDFS-10999:


Thanks for the insight Allen,

{quote}
M: "How long for recovery?"
A: "No idea. The NN doesn't tell me if these are EC blocks or regular blocks 
that were lost and one is faster to recover than the other."
{quote}

That's what I was getting at with the pendingReconstructionBlocksCount. If we 
fix it as I talked about above, it'd actually tell you how much work is 
remaining, and how fast that work is progressing.

{quote}
...I've also used it during system recovery and migrations as a measurement of 
how many more DNs I need to bring up such that I have more sources for block 
replication. 
{quote}

Would the "pending" queue metrics also work for this? We can also look at 
improved DN-side metrics related to replication work.

{quote}
This number represents something that I as an admin have some semblance of 
control over: I could always manually copy blocks from one node to another to 
speed things up.
Under EC, I don't know of anything manual I can do if it is missing chunks of 
blocks.
{quote}

I really, really hope that manually copying blocks around is not a normal part 
of operating an HDFS cluster.

Point is still valid though, maybe we should take a harder look at the recovery 
work throttles on the NN and DN, and make them dynamically reconfigurable if 
they aren't. I recall seeing some customer issues where we temporarily bumped 
up these values to more quickly recover from failures.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-14 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576844#comment-15576844
 ] 

Allen Wittenauer commented on HDFS-10999:
-

bq. We tried to draw an equivalence between the durability of EC and replicated 
files by looking at the # of failures to data loss. This way we have a way of 
prioritizing both types of recovery work on the NN (see the LowRedundancyBlocks 
class, nee UnderReplicatedBlocks).

Hmm. That's great for the NN, but it leaves me as an admin in the dark. 

A: "So we had some issues on HDFS."

M: "What's the damage?"

A: "We are missing x blocks."

M: "How long for recovery?"

A: "No idea.  The NN doesn't tell me if these are EC blocks or regular blocks 
that were lost and one is faster to recover than the other."

bq.  In my experience, the "# under replicated blocks" is used as a quick check 
of cluster health.

It's used for that, but I've also used it during system recovery and migrations 
as a measurement of how many more DNs I need to bring up such that I more 
sources for block replication. This number represents something that I as an 
admin have some semblance of control over:  I could always manually copy blocks 
from one node to another to speed things up.  

Under EC, I don't of anything manual I can do if it is missing chunks of blocks.



> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572797#comment-15572797
 ] 

Andrew Wang commented on HDFS-10999:


We tried to draw an equivalence between the durability of EC and replicated 
files by looking at the # of failures to data loss. This way we have a way of 
prioritizing both types of recovery work on the NN (see the LowRedundancyBlocks 
class, nee UnderReplicatedBlocks).

I think this is kind of okay from an admin POV. In my experience, the "# under 
replicated blocks" is used as a quick check of cluster health. If it's non-zero 
or not a small number, something is off and maybe you shouldn't rolling restart 
your cluster.

Something we might want to take a harder look at is actually the 
pendingReconstructionBlocksCount. By looking at the rate of change, it tells 
you how long until your cluster is back up to full strength. However, since EC 
recovery is more expensive than replication, this metric is underspecified. The 
cost for recovery also depends on the EC policy for that block.

We should also reexamine the block recovery throttles for the same reason. It's 
still looking at the # of blocks being recovered rather than the amount of I/O.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-13 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572514#comment-15572514
 ] 

Zhe Zhang commented on HDFS-10999:
--

Thanks for the thoughts [~aw]. EC policies are configured per file and per 
directory. So it's possible (and very likely in a 3.0 deployment) to have EC an 
non-EC blocks in the same cluster.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-13 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572492#comment-15572492
 ] 

Allen Wittenauer commented on HDFS-10999:
-

Given that the fsck output is the *only* way to get some pieces of information, 
changing fsck is almost always a major, ops breaking event.  Coupling that with 
pretty much breaking metrics collection... There is no "supposed" here: this is 
very very incompatible and will cause admins to burn Apache Hadoop conferences 
to the ground in their anger if we aren't careful.

That said, I can empathize with the EC folks.  'under replicated' doesn't 
really cover the state of a block with a missing reconstructable chunk.  But 
I'm not sure that 'low redundancy' necessarily conveys the state of a non-EC 
block either. If I'm not running EC at all, it comes across as a gratuitous 
change.  I need to think more about this, to be honest.

But some questions first:

I'm trying to remember, is it possible to have EC and non-EC blocks in a file 
system?  If not, what about in the future? Are we actually trying to shoe horn 
two separate measurements into the same metric here?  Is there a situation 
where having both under replicated and low redundancy blocks makes sense?  How 
does the storage policy interact with a change like this?

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-12 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569421#comment-15569421
 ] 

Andrew Wang commented on HDFS-10999:


Yea, thanks for the pointer [~rakesh_r]. One idea is that we could add new 
metrics with more accurate names and deprecate the old ones for Hadoop 3. Then 
Hadoop 4, consider removing them.

For the webui, I think we can migrate it over directly, since it's not covered 
by compatibility. Web users are supposed to parse /jmx output instead, which is 
structured.

It's a bit trickier for shell commands. Empirically, people do parse the output 
of commands like dfsadmin and fsck. Having double prints for both the old and 
new names is kind of ugly.

[~aw] any thoughts on how best to migrate the names for the shell tools?

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-12 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568227#comment-15568227
 ] 

Rakesh R commented on HDFS-10999:
-

Thanks [~jojochuang] for bringing this point. I remember there was a 
[discussion|https://issues.apache.org/jira/browse/HDFS-7955?focusedCommentId=15159765=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15159765]
 about renaming the replication related metrics. It was not done considering 
that the upper layer applications quite heavily used these metrics and could be 
risky.

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10999) Use more generic "low redundancy" blocks instead of "under replicated" blocks

2016-10-12 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568050#comment-15568050
 ] 

Wei-Chiu Chuang commented on HDFS-10999:


One thing I want to call out is metrics names. There are metrics named 
"UnderReplicatedBlocks", "PendingReplicationBlocks" and etc. Changing them are 
supposedly incompatible, especially for monitoring systems (such as Cloudera 
Manager or Ambari).

[~zhz] [~andrew.wang] appreciate if you can comment on this. Thanks!

> Use more generic "low redundancy" blocks instead of "under replicated" blocks
> -
>
> Key: HDFS-10999
> URL: https://issues.apache.org/jira/browse/HDFS-10999
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Wei-Chiu Chuang
>Assignee: Yuanbo Liu
>  Labels: supportability
>
> Per HDFS-9857, it seems in the Hadoop 3 world, people prefer the more generic 
> term "low redundancy" to the old-fashioned "under replicated". But this term 
> is still being used in messages in several places, such as web ui, dfsadmin 
> and fsck. We should probably change them to avoid confusion.
> File this jira to discuss it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org