[ https://issues.apache.org/jira/browse/HDFS-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mukul Kumar Singh updated HDFS-11786: ------------------------------------- Attachment: HDFS-11786.002.patch Thanks for the review [~anu], I have modified the copyfromLocal to make it multithreaded. Number of threads is an optional parameter, default value for number of threads is 1. This improvement does help in reducing time to copy files drastically, reducing copy time from 14m7s to 3m18s. Please note that the test was done with 12,000 files with random file sizes between 1-10 MB. *Single threaded put with the put command* {code} [hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs -put test /single2 17/06/30 12:06:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable real 14m7.093s user 5m48.357s sys 1m54.895s {code} *For Multi threaded put with 10 threads using copyFromLocal command* {code} [hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs -copyFromLocal -nt 10 test /multi1 17/06/30 12:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable real 3m18.574s user 3m42.582s sys 1m18.718s {code} > Add a new command for multi threaded Put/CopyFromLocal > ------------------------------------------------------ > > Key: HDFS-11786 > URL: https://issues.apache.org/jira/browse/HDFS-11786 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Reporter: Mukul Kumar Singh > Assignee: Mukul Kumar Singh > Attachments: HDFS-11786.001.patch, HDFS-11786.002.patch > > > CopyFromLocal/Put is not currently multithreaded. > In case, where there are multiple files which need to be uploaded to the > hdfs, a single thread reads the file and then copies the data to the cluster. > This copy to hdfs can be made faster by uploading multiple files in parallel. > I am attaching the initial patch so that I can get some initial feedback. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org