[ 
https://issues.apache.org/jira/browse/HDFS-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833687#comment-17833687
 ] 

ASF GitHub Bot commented on HDFS-17216:
---------------------------------------

steveloughran commented on PR #6138:
URL: https://github.com/apache/hadoop/pull/6138#issuecomment-2035191987

   @xiaojunxiang2023 
   its more that there's a separate component in JIRA and nobody knows where it 
ends up. distcp was originally hdfs to hdfs but as its more broad now, it 
should have broader supervision. Tricky little cross-project visibility thing: 
we who do HADOOP-* stuff have to avoid breaking hdfs and yarn....




> When distcp handle the small files, the bandwidth parameter will be invalid, 
> resulting in serious overspeed behavior
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17216
>                 URL: https://issues.apache.org/jira/browse/HDFS-17216
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 3.3.4
>            Reporter: xiaojunxiang
>            Assignee: xiaojunxiang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>         Attachments: DiscpAnalyze.jpg
>
>
> When distcp copies small files (file size slightly smaller than the 
> bandwidth), the throbber only starts to throb after 1 second, and the 
> throttled is specific to a single file. so the throbber becomes invalid, 
> causing distcp to fill the cluster bandwidth and crush production traffic, 
> which is a terrible thing. 
> Also, it takes time for files to set up the IO pipeline, so you shouldn't 
> test with very small files, which will slow the transfer, especially as 
> bandwidth kicks in, which will amplify the impact of small files on the rate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to