On Jul 5, 2010, at 5:01 PM, elton sky wrote:
> Well, this sounds good when you have many small files, you concat() them
> into a big one. I am talking about split a big file into blocks and copy all
> a few blocks in parallel.

Basically, your point is that hadoop dfs -cp is relatively slow and could be 
made faster.  If HDFS had a more multi-threaded design, it would make cp 
operations faster.  

This sounds like a particularly high cost for an operation that is rarely 
utilized.  [This is much more interesting in a distcp context, but even then 
not that great.  distcp in my experience is usually used to push a bunch of 
files, so you get your parallelism at the file level.  Typically these are part 
files are usually the same approx. size.]


Reply via email to