To add to Jay Booth's points, adding multi-threaded capability to HDFS will bring down the performance. Consider a production server where 4-5 jobs are running on a low-end commodity server. Currently, that is 4 threads reading and writing from the hard disk. Making it a multi-threaded read and write will create many threads (Number of Jobs * Default HDFS Block size * 1024 KB/ file system block sizes). For a low-end hard disk with limited RPM cycles, a higher number of threads will decrease the performance. As the number of disk access increase from 1, the throughput will increase. But after 3-4 parallel disk accesses, the performance will start to decrease. You can use performance analytics tools (like IOMeter) to identify the *ideal* number of parallel disk accesses for a specified hard-disk.
--- Gautam On Mon, Jul 5, 2010 at 8:46 PM, elton sky <eltonsky9...@gmail.com> wrote: >>Basically, your point is that hadoop dfs -cp is relatively slow and could > be made faster. If HDFS had a more multi-threaded >design, itwould make cp > operations faster. > What I mean is, if we have the size of a file we can parallel by calculating > blocks. Otherwise we couldn't. > > > On Tue, Jul 6, 2010 at 10:47 AM, Allen Wittenauer > <awittena...@linkedin.com>wrote: > >> >> On Jul 5, 2010, at 5:01 PM, elton sky wrote: >> > Well, this sounds good when you have many small files, you concat() them >> > into a big one. I am talking about split a big file into blocks and copy >> all >> > a few blocks in parallel. >> >> Basically, your point is that hadoop dfs -cp is relatively slow and could >> be made faster. If HDFS had a more multi-threaded design, it would make cp >> operations faster. >> >> This sounds like a particularly high cost for an operation that is rarely >> utilized. [This is much more interesting in a distcp context, but even then >> not that great. distcp in my experience is usually used to push a bunch of >> files, so you get your parallelism at the file level. Typically these are >> part files are usually the same approx. size.] >> >> >> >