Sorry, didn't mean to imply that files actually get split up into many
files.  The multiple copies that I was referring to was due to the
replication of files by HDFS.


On 8/26/07 11:49 PM, "Arun C Murthy" <[EMAIL PROTECTED]> wrote:

>> And yes. They do get split up again.  They also get copied to multiple nodes
>> so that the reads can proceed in parallel.  The most important effects of
>> concatenation and importing into HDFS are the parallelism and the reading of
>> sequential disk blocks in processing.
> 
> Actually, hadoop's map-reduce usually works on 'logical' splits i.e.
> each map works only on the 'logical' split (<filename, offset, length>
> triplet).

Reply via email to