Sorry, didn't mean to imply that files actually get split up into many files. The multiple copies that I was referring to was due to the replication of files by HDFS.
On 8/26/07 11:49 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote: >> And yes. They do get split up again. They also get copied to multiple nodes >> so that the reads can proceed in parallel. The most important effects of >> concatenation and importing into HDFS are the parallelism and the reading of >> sequential disk blocks in processing. > > Actually, hadoop's map-reduce usually works on 'logical' splits i.e. > each map works only on the 'logical' split (<filename, offset, length> > triplet).