[jira] [Commented] (HDFS-17538) Add tranfer priority queue for decommissioning datanode

2024-05-29 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850297#comment-17850297
 ] 

Bei Peng commented on HDFS-17538:
-

hi,bro. I also encountered the same problem and solved it with HDFS-14854, you 
can also see if you can solve your problem.

> Add tranfer priority queue for decommissioning datanode
> ---
>
> Key: HDFS-17538
> URL: https://issues.apache.org/jira/browse/HDFS-17538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Yuanbo Liu
>Priority: Major
> Attachments: image-2024-05-29-16-24-45-601.png, 
> image-2024-05-29-16-26-58-359.png, image-2024-05-29-16-27-35-886.png
>
>
> When decommissioning datanode, blocks will be checked one by one disk, then 
> blocks will be sent to trigger tranfer works in DN. This will make one disk 
> of decommissioning dn very busy and cpus stuck in io-wait with high loads, 
> and sometime even lead to OOM as below:
> !image-2024-05-29-16-24-45-601.png|width=909,height=170!
> !image-2024-05-29-16-26-58-359.png|width=909,height=228!
> !image-2024-05-29-16-27-35-886.png|width=930,height=218!
> Proposal to add priority queue for transfering blocks when decommisioning 
> datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2023-05-30 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727471#comment-17727471
 ] 

Bei Peng commented on HDFS-2139:


[~xuzq_zander] Hi bro,I am interested in doing some things in this job.

I can do some sub-tasks such as:   
https://issues.apache.org/jira/browse/HDFS-16758   and  
https://issues.apache.org/jira/browse/HDFS-16760 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-23 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583461#comment-17583461
 ] 

Bei Peng commented on HDFS-2139:


[~xuzq_zander]  In my experience, the speed of FastCopy depends on the number 
of blocks and the number of files  (the amount of metadata).  Distcp's existing 
Map task input splitting strategy will cause data skewing when using FastCopy. 
For example, a Map will copy a 128 MB file with only 1 block. The other Map 
will copy 128 1M files with 128 blocks, which leads to long-tailed tasks.   So 
I think we need a Map task  input splitting strategy based on the number of 
blocks. 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578296#comment-17578296
 ] 

Bei Peng commented on HDFS-2139:


{quote}Many companies backport it into their internal branches and use it.
 * DistCp supports fastcopy
 * Implement block based strategy{quote}
me too.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org