[ 
https://issues.apache.org/jira/browse/HDFS-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-11786:
-------------------------------------
    Attachment: HDFS-11786.002.patch

Thanks for the review [~anu], I have modified the copyfromLocal to make it 
multithreaded.
Number of threads is an optional parameter, default value for number of threads 
is 1.

This improvement does help in reducing time to copy files drastically, reducing 
copy time from 14m7s to 3m18s. Please note that the test was done with 12,000 
files with random file sizes between  1-10 MB. 

*Single threaded put with the put command*
{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs -put 
test /single2
17/06/30 12:06:48 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

real    14m7.093s
user    5m48.357s
sys     1m54.895s
{code}

*For Multi threaded put with 10 threads using copyFromLocal command*
{code}
[hdfs@y129 ~]$ time /opt/hadoop/hadoop-3.0.0-alpha4-SNAPSHOT/bin/hdfs dfs 
-copyFromLocal -nt 10  test /multi1
17/06/30 12:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable

real    3m18.574s
user    3m42.582s
sys     1m18.718s
{code}

> Add a new command for multi threaded Put/CopyFromLocal
> ------------------------------------------------------
>
>                 Key: HDFS-11786
>                 URL: https://issues.apache.org/jira/browse/HDFS-11786
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>            Reporter: Mukul Kumar Singh
>            Assignee: Mukul Kumar Singh
>         Attachments: HDFS-11786.001.patch, HDFS-11786.002.patch
>
>
> CopyFromLocal/Put is not currently multithreaded.
> In case, where there are multiple files which need to be uploaded to the 
> hdfs, a single thread reads the file and then copies the data to the cluster.
> This copy to hdfs can be made faster by uploading multiple files in parallel.
> I am attaching the initial patch so that I can get some initial feedback.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to