[jira] [Updated] (HDFS-17158) Show the rate of metrics in EC recovery task.

2023-08-25 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17158:
---
Description: 
From

!image2023-8-18_16-26-14.png|width=551,height=83!

To

!123124124.png|width=559,height=100!

These metrics may show the network and CPU load of the machine.

  was:
From

!image2023-8-18_16-26-14.png|width=551,height=83!

To

!123124124.png|width=559,height=100!


> Show the rate of metrics in EC recovery task.
> -
>
> Key: HDFS-17158
> URL: https://issues.apache.org/jira/browse/HDFS-17158
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, metrics
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 123124124.png, image2023-8-18_16-26-14.png
>
>
> From
> !image2023-8-18_16-26-14.png|width=551,height=83!
> To
> !123124124.png|width=559,height=100!
> These metrics may show the network and CPU load of the machine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17158) Show the rate of metrics in EC recovery task.

2023-08-18 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17158:
---
Description: 
From

!image2023-8-18_16-26-14.png|width=551,height=83!

To

!123124124.png|width=559,height=100!

> Show the rate of metrics in EC recovery task.
> -
>
> Key: HDFS-17158
> URL: https://issues.apache.org/jira/browse/HDFS-17158
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, metrics
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 123124124.png, image2023-8-18_16-26-14.png
>
>
> From
> !image2023-8-18_16-26-14.png|width=551,height=83!
> To
> !123124124.png|width=559,height=100!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17158) Show the rate of metrics in EC recovery task.

2023-08-18 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17158:
---
Attachment: image2023-8-18_16-26-14.png

> Show the rate of metrics in EC recovery task.
> -
>
> Key: HDFS-17158
> URL: https://issues.apache.org/jira/browse/HDFS-17158
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, metrics
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 123124124.png, image2023-8-18_16-26-14.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17158) Show the rate of metrics in EC recovery task.

2023-08-18 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17158:
---
Attachment: 123124124.png

> Show the rate of metrics in EC recovery task.
> -
>
> Key: HDFS-17158
> URL: https://issues.apache.org/jira/browse/HDFS-17158
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, metrics
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 123124124.png, image2023-8-18_16-26-14.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17158) Show the rate of metrics in EC recovery task.

2023-08-13 Thread WangYuanben (Jira)
WangYuanben created HDFS-17158:
--

 Summary: Show the rate of metrics in EC recovery task.
 Key: HDFS-17158
 URL: https://issues.apache.org/jira/browse/HDFS-17158
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: erasure-coding, metrics
Reporter: WangYuanben
Assignee: WangYuanben






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17016) Cleanup method calls to static Assert and Assume methods.

2023-07-26 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben resolved HDFS-17016.

Resolution: Not A Problem

> Cleanup method calls to static Assert and Assume methods.
> -
>
> Key: HDFS-17016
> URL: https://issues.apache.org/jira/browse/HDFS-17016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Cleanup method calls to static Assert and Assume methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17113) Reconfig transfer and write bandwidth for datanode.

2023-07-25 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben reassigned HDFS-17113:
--

Assignee: WangYuanben

> Reconfig transfer and write bandwidth for datanode.
> ---
>
> Key: HDFS-17113
> URL: https://issues.apache.org/jira/browse/HDFS-17113
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Major
>  Labels: pull-request-available
>
> To avoid frequent rolling restarts of the DN, we should make 
> dfs.datanode.data.transfer.bandwidthPerSec and 
> dfs.datanode.data.write.bandwidthPerSec reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17113) Reconfig transfer and write bandwidth for datanode.

2023-07-21 Thread WangYuanben (Jira)
WangYuanben created HDFS-17113:
--

 Summary: Reconfig transfer and write bandwidth for datanode.
 Key: HDFS-17113
 URL: https://issues.apache.org/jira/browse/HDFS-17113
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Reporter: WangYuanben


To avoid frequent rolling restarts of the DN, we should make 
dfs.datanode.data.transfer.bandwidthPerSec and 
dfs.datanode.data.write.bandwidthPerSec reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17091) Blocks on DECOMMISSIONING DNs should be sorted properly in LocatedBlocks

2023-07-17 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17091:
---
Description: Similar to 
[HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076], I think 
decommissioning DNs needs to be taken into consideration. After sorting the 
expected location list will be: live -> slow -> stale -> staleAndSlow -> 
entering_maintenance -> decommissioning -> decommissioned.  (was: Being similar 
to [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076], I think 
decommissioning DNs needs to be taken into consideration. After sorting the 
expected location list will be: live -> slow -> stale -> staleAndSlow -> 
entering_maintenance -> decommissioned -> decommissioning.)

> Blocks on DECOMMISSIONING DNs should be sorted properly in LocatedBlocks
> 
>
> Key: HDFS-17091
> URL: https://issues.apache.org/jira/browse/HDFS-17091
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Major
>  Labels: pull-request-available
>
> Similar to [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076], I 
> think decommissioning DNs needs to be taken into consideration. After sorting 
> the expected location list will be: live -> slow -> stale -> staleAndSlow -> 
> entering_maintenance -> decommissioning -> decommissioned.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17091) Blocks on DECOMMISSIONING DNs should be sorted properly in LocatedBlocks

2023-07-16 Thread WangYuanben (Jira)
WangYuanben created HDFS-17091:
--

 Summary: Blocks on DECOMMISSIONING DNs should be sorted properly 
in LocatedBlocks
 Key: HDFS-17091
 URL: https://issues.apache.org/jira/browse/HDFS-17091
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: WangYuanben
Assignee: WangYuanben


Being similar to [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076], 
I think decommissioning DNs needs to be taken into consideration. After sorting 
the expected location list will be: live -> slow -> stale -> staleAndSlow -> 
entering_maintenance -> decommissioned -> decommissioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17033) Update fsck to display stale state info of blocks accurately

2023-07-06 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17033:
---
Description: When the DN is stale, Block replica on this DN should be 
"STALE" instead of "HEALTHY" in block check of fsck.  (was: When the DN is 
stale, blocks on this DN should be "STALE" instead of "HEALTHY" in block check 
of fsck.)

> Update fsck to display stale state info of blocks accurately
> 
>
> Key: HDFS-17033
> URL: https://issues.apache.org/jira/browse/HDFS-17033
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namanode
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
>
> When the DN is stale, Block replica on this DN should be "STALE" instead of 
> "HEALTHY" in block check of fsck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17033) Update fsck to display stale state info of blocks accurately

2023-07-06 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17033:
---
Description: When the DN is stale, blocks on this DN should be "STALE" 
instead of "HEALTHY" in block check of fsck.

> Update fsck to display stale state info of blocks accurately
> 
>
> Key: HDFS-17033
> URL: https://issues.apache.org/jira/browse/HDFS-17033
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namanode
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
>
> When the DN is stale, blocks on this DN should be "STALE" instead of 
> "HEALTHY" in block check of fsck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-29 Thread WangYuanben (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17738576#comment-17738576
 ] 

WangYuanben commented on HDFS-17061:


[~sodonnell] Thank you for the comment. I need some examples to validate this 
idea, but it seems there is currently no direct way to obtain the number of 
data blocks and parity blocks. Therefore, it is necessary to develop a 
functionality to retrieve the number of data blocks and parity blocks first and 
do some tests in the subtask. I will create it later.

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 

+If we can let data blocks and parity blocks on DNs more balanced, the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=600,height=333! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=600,height=333! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Component/s: hdfs

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Component/s: balancer & mover
 (was: balamcer)

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=600,height=333! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=815,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=815,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balamcer, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=815,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=815,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=700,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=700,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balamcer, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=815,height=550! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=815,height=550! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=700,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=700,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=850,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=850,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balamcer, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=700,height=550! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=700,height=550! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=850,height=550! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=850,height=550! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balamcer, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=850,height=550! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=850,height=550! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)
WangYuanben created HDFS-17061:
--

 Summary: EC: Let data blocks and parity blocks on DNs more balanced
 Key: HDFS-17061
 URL: https://issues.apache.org/jira/browse/HDFS-17061
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balamcer, erasure-coding
Reporter: WangYuanben
 Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
balanced traffic load on DNs.png

When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=650,height=650! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=650,height=650! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17061:
---
Description: 
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.

  was:
When choosing DN for placing data block or parity block, the existing number of 
data block and parity block on datanode is not taken into consideration. This 
may lead to *uneven traffic load*.

As shown in the figure 1, when reading block group A, B, C, D and E from five 
different EC files without any missing block, datanodes like DN1 and DN2 will 
have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
or even no traffic load. 

 !figure1, unbalanced traffic load on DNs.png|width=650,height=650! 

If we can let data blocks and parity blocks on DNs more balanced, +the traffic 
load in cluster will be more balanced and the peak traffic load on DN will be 
reduced+. Here "balance" refers to the matching of the number of data blocks 
and parity blocks on DN with its EC policy. In the ideal state, each DN has a 
balanced traffic load just like what figure 2 shows. 

 !figure2, balanced traffic load on DNs.png|width=650,height=650! 

Then how to reduce this imbalance? I think it's related to EC policy and the 
ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
appropriate to let the ratio close to 3:2. 

There are two solutions:
1.Improve the block placement policy.
2.Improve the Balancer.


> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balamcer, erasure-coding
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png! 
> If we can let data blocks and parity blocks on DNs more balanced, +the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17033) Update fsck to display stale state info of blocks accurately

2023-06-01 Thread WangYuanben (Jira)
WangYuanben created HDFS-17033:
--

 Summary: Update fsck to display stale state info of blocks 
accurately
 Key: HDFS-17033
 URL: https://issues.apache.org/jira/browse/HDFS-17033
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namanode
Reporter: WangYuanben
Assignee: WangYuanben
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17016) Cleanup method calls to static Assert and Assume methods.

2023-05-17 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17016:
---
Summary: Cleanup method calls to static Assert and Assume methods.  (was: 
Cleanup method calls to static Assert methods in TestCodecRawCoderMapping)

> Cleanup method calls to static Assert and Assume methods.
> -
>
> Key: HDFS-17016
> URL: https://issues.apache.org/jira/browse/HDFS-17016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
> Fix For: 3.4.0
>
>
> Cleanup method calls to static Assert methods in TestCodecRawCoderMapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17016) Cleanup method calls to static Assert and Assume methods.

2023-05-17 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17016:
---
Description: Cleanup method calls to static Assert and Assume methods.  
(was: Cleanup method calls to static Assert methods in 
TestCodecRawCoderMapping.)

> Cleanup method calls to static Assert and Assume methods.
> -
>
> Key: HDFS-17016
> URL: https://issues.apache.org/jira/browse/HDFS-17016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
> Fix For: 3.4.0
>
>
> Cleanup method calls to static Assert and Assume methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17016) Cleanup method calls to static Assert methods in TestCodecRawCoderMapping

2023-05-17 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17016:
---
Description: Cleanup method calls to static Assert methods in 
TestCodecRawCoderMapping.  (was: Cleanup method calls to static Assert and 
Assume methods in TestCodecRawCoderMapping.)

> Cleanup method calls to static Assert methods in TestCodecRawCoderMapping
> -
>
> Key: HDFS-17016
> URL: https://issues.apache.org/jira/browse/HDFS-17016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
> Fix For: 3.4.0
>
>
> Cleanup method calls to static Assert methods in TestCodecRawCoderMapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17016) Cleanup method calls to static Assert methods in TestCodecRawCoderMapping

2023-05-17 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-17016:
---
Summary: Cleanup method calls to static Assert methods in 
TestCodecRawCoderMapping  (was: Cleanup method calls to static Assert and 
Assume methods in TestCodecRawCoderMapping)

> Cleanup method calls to static Assert methods in TestCodecRawCoderMapping
> -
>
> Key: HDFS-17016
> URL: https://issues.apache.org/jira/browse/HDFS-17016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: WangYuanben
>Assignee: WangYuanben
>Priority: Minor
> Fix For: 3.4.0
>
>
> Cleanup method calls to static Assert and Assume methods in 
> TestCodecRawCoderMapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17016) Cleanup method calls to static Assert and Assume methods in TestCodecRawCoderMapping

2023-05-17 Thread WangYuanben (Jira)
WangYuanben created HDFS-17016:
--

 Summary: Cleanup method calls to static Assert and Assume methods 
in TestCodecRawCoderMapping
 Key: HDFS-17016
 URL: https://issues.apache.org/jira/browse/HDFS-17016
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: WangYuanben
Assignee: WangYuanben
 Fix For: 3.4.0


Cleanup method calls to static Assert and Assume methods in 
TestCodecRawCoderMapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16977) Forbid assigned characters in pathname.

2023-04-19 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-16977:
---
Attachment: HDFS-16977__Forbid_assigned_characters_in_pathname_.patch

> Forbid assigned characters in pathname.
> ---
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-16977__Forbid_assigned_characters_in_pathname_.patch
>
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
>  
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16977) Forbid assigned characters in pathname.

2023-04-19 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben resolved HDFS-16977.

Resolution: Works for Me

> Forbid assigned characters in pathname.
> ---
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-16977__Forbid_assigned_characters_in_pathname_.patch
>
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
>  
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16977) Forbid assigned characters in pathname.

2023-04-12 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-16977:
---
Summary: Forbid assigned characters in pathname.  (was: Forbid assigned 
characters in pathname when new file or directory is created.)

> Forbid assigned characters in pathname.
> ---
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
>  
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16977) Forbid assigned characters in pathname when new file or directory is created.

2023-04-12 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-16977:
---
Description: 
Some pathnames which contains special character(s) may lead to unexpected 
results. For example, there is a file named "/foo/file*" in my cluster, created 
by "DistributedFileSystem.create(new Path("/foo/file*"))". When I want to 
remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I remove all 
the files with the prefix of "/foo/file*" unexpectedly. There are also some 
other characters just like '*', such as ' ', '|', '&', etc.

 

Therefore, it's necessary to restrict the occurrence of these characters in 
pathname. A simple but effective way is to forbid assigned characters in 
pathname when new file or directory is created.

 

It is also important to add the same function on the Router model and WebHdfs 
model. I will add them as two subtasks later.

  was:
Some pathnames which contains special character(s) may lead to unexpected 
results. For example, there is a file named "/foo/file*" in my cluster, created 
by "DistributedFileSystem.create(new Path("/foo/file*"))". When I want to 
remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I remove all 
the files with the prefix of "/foo/file*" unexpectedly. There are also some 
other characters just like '*', such as ' ', '|', '&', etc.

Therefore, it's necessary to restrict the occurrence of these characters in 
pathname. A simple but effective way is to forbid assigned characters in 
pathname when new file or directory is created.

 

It is also important to add the same function on the Router model and WebHdfs 
model. I will add them as two subtasks later.


> Forbid assigned characters in pathname when new file or directory is created.
> -
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
>  
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16977) Forbid assigned characters in pathname when new file or directory is created.

2023-04-12 Thread WangYuanben (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

WangYuanben updated HDFS-16977:
---
Description: 
Some pathnames which contains special character(s) may lead to unexpected 
results. For example, there is a file named "/foo/file*" in my cluster, created 
by "DistributedFileSystem.create(new Path("/foo/file*"))". When I want to 
remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I remove all 
the files with the prefix of "/foo/file*" unexpectedly. There are also some 
other characters just like '*', such as ' ', '|', '&', etc.

Therefore, it's necessary to restrict the occurrence of these characters in 
pathname. A simple but effective way is to forbid assigned characters in 
pathname when new file or directory is created.

 

It is also important to add the same function on the Router model and WebHdfs 
model. I will add them as two subtasks later.

  was:
Some pathnames which contains special character(s) may lead to unexpected 
results. For example, there is a file named "/foo/file*" in my cluster, created 
by "DistributedFileSystem.create(new Path("/foo/file*"))". When I want to 
remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I remove all 
the files with the prefix of "/foo/file*" unexpectedly. There are also some 
other characters just like '*', such as ' ', '|', '&', etc.

Therefore, it's necessary to restrict the occurrence of these characters in 
pathname. A simple but effective way is to forbid assigned characters in 
pathname when new file or directory is created.


> Forbid assigned characters in pathname when new file or directory is created.
> -
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16977) Forbid assigned characters in pathname when new file or directory is created.

2023-04-12 Thread WangYuanben (Jira)
WangYuanben created HDFS-16977:
--

 Summary: Forbid assigned characters in pathname when new file or 
directory is created.
 Key: HDFS-16977
 URL: https://issues.apache.org/jira/browse/HDFS-16977
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: dfsclient, namenode
Affects Versions: 3.3.4
Reporter: WangYuanben


Some pathnames which contains special character(s) may lead to unexpected 
results. For example, there is a file named "/foo/file*" in my cluster, created 
by "DistributedFileSystem.create(new Path("/foo/file*"))". When I want to 
remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I remove all 
the files with the prefix of "/foo/file*" unexpectedly. There are also some 
other characters just like '*', such as ' ', '|', '&', etc.

Therefore, it's necessary to restrict the occurrence of these characters in 
pathname. A simple but effective way is to forbid assigned characters in 
pathname when new file or directory is created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16965) Add switch to decide whether to enable native codec.

2023-03-29 Thread WangYuanben (Jira)
WangYuanben created HDFS-16965:
--

 Summary: Add switch to decide whether to enable native codec.
 Key: HDFS-16965
 URL: https://issues.apache.org/jira/browse/HDFS-16965
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: erasure-coding
Affects Versions: 3.3.4
Reporter: WangYuanben


Sometimes we need to create codec without ISA-L, while priority is given to 
native codec by default. So it is necessary to add switch to decide whether to 
enable native codec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org