[ 
https://issues.apache.org/jira/browse/HDFS-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833023#comment-17833023
 ] 

ASF GitHub Bot commented on HDFS-17216:
---------------------------------------

xiaojunxiang2023 commented on PR #6138:
URL: https://github.com/apache/hadoop/pull/6138#issuecomment-2031158756

   > just missed this as it went in under hdfs/distcp rather than hadoop common 
distcp (why two? who knows?)
   > 
   > made some basic comments. If you can address them as a followup (same jira 
id, mention me in the PR) then we can get this in and then into branch-3.4.
   > 
   > please don't backport until then!
   
   Hi~, I double-checked that distcp only exists in hadoop-tools/hadop-distcp, 
not hadoop-common, right?
   
![image](https://github.com/apache/hadoop/assets/65019264/18cdde83-0667-4231-8852-9fb63ddfd35c)
   




> When distcp handle the small files, the bandwidth parameter will be invalid, 
> resulting in serious overspeed behavior
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17216
>                 URL: https://issues.apache.org/jira/browse/HDFS-17216
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 3.3.4
>            Reporter: xiaojunxiang
>            Assignee: xiaojunxiang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>         Attachments: DiscpAnalyze.jpg
>
>
> When distcp copies small files (file size slightly smaller than the 
> bandwidth), the throbber only starts to throb after 1 second, and the 
> throttled is specific to a single file. so the throbber becomes invalid, 
> causing distcp to fill the cluster bandwidth and crush production traffic, 
> which is a terrible thing. 
> Also, it takes time for files to set up the IO pipeline, so you shouldn't 
> test with very small files, which will slow the transfer, especially as 
> bandwidth kicks in, which will amplify the impact of small files on the rate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to