[ https://issues.apache.org/jira/browse/HDFS-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833023#comment-17833023 ]
ASF GitHub Bot commented on HDFS-17216: --------------------------------------- xiaojunxiang2023 commented on PR #6138: URL: https://github.com/apache/hadoop/pull/6138#issuecomment-2031158756 > just missed this as it went in under hdfs/distcp rather than hadoop common distcp (why two? who knows?) > > made some basic comments. If you can address them as a followup (same jira id, mention me in the PR) then we can get this in and then into branch-3.4. > > please don't backport until then! Hi~, I double-checked that distcp only exists in hadoop-tools/hadop-distcp, not hadoop-common, right? ![image](https://github.com/apache/hadoop/assets/65019264/18cdde83-0667-4231-8852-9fb63ddfd35c) > When distcp handle the small files, the bandwidth parameter will be invalid, > resulting in serious overspeed behavior > -------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-17216 > URL: https://issues.apache.org/jira/browse/HDFS-17216 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp > Affects Versions: 3.3.4 > Reporter: xiaojunxiang > Assignee: xiaojunxiang > Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > Attachments: DiscpAnalyze.jpg > > > When distcp copies small files (file size slightly smaller than the > bandwidth), the throbber only starts to throb after 1 second, and the > throttled is specific to a single file. so the throbber becomes invalid, > causing distcp to fill the cluster bandwidth and crush production traffic, > which is a terrible thing. > Also, it takes time for files to set up the IO pipeline, so you shouldn't > test with very small files, which will slow the transfer, especially as > bandwidth kicks in, which will amplify the impact of small files on the rate -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org