This only handles the problem of putting lots of files. It doesn't deal with putting files in parallel (at once).
This is a ticklish problem since even on a relatively small cluster, dfs has a higher read speed than most storage can read. That means that you can swamp things pretty easily. When I have files on a single source machine, I just spawn multiple -put's on sub-directories until I have sufficiently saturated the read speed of the source. If all of the cluster members have access to a universal file system, then you can use the (undocumented) pdist command, but I don't like that as much. You also have to watch out if you start writing from a host in your cluster else you will wind up with odd imbalances in file storage. In my case, the source of the data is actually outside of the cluster and I get pretty good balancing. If you do wind up with bad balancing, the best option I have seen is to increase the replication on individual files for 30-60 seconds and then decrease it again. In order to get sufficient throughput for the rebalancing, I pipeline lots of these changes so that I have 10-100 files at a time with higher replication. This does tend to substantially increase the number of files with excess replication, but that corrects itself pretty quickly. On 10/31/07 1:53 PM, "Aaron Kimball" <[EMAIL PROTECTED]> wrote: > hadoop dfs -put will take a directory. If it won't work recursively, > then you can probably bang out a bash script that will handle it using > find(1) and xargs(1). > > -- Aaron > > Chris Fellows wrote: >> Hello! >> >> Quick simple question, hopefully someone out there could answer. >> >> Does the hadoop dfs support putting multiple files at once? >> >> The documentation says -put only works on one file. What's the best way to >> import multiple files in multiple directories (i.e. dir1/file1 dir1/file2 >> dir2/file1 dir2/file2 etc)? >> >> End goal would be to do something like: >> >> bin/hadoop dfs -put /dir*/file* /myfiles >> >> And a follow-up: bin/hadoop dfs -lsr /myfiles >> would list: >> >> /myfiles/dir1/file1 >> /myfiles/dir1/file2 >> /myfiles/dir2/file1 >> /myfiles/dir2/file2 >> >> Thanks again for any input!!! >> >> - chris >> >> >>