Re: Read files from hdfs

elton sky Sun, 08 May 2011 00:39:14 -0700

Hassen,

Read in hdfs is sequential, i.e. read one block after another. Each time the
client will connect to one data node to read a block. Then connect to
another (or the same) data node to read next block.
The reason for this sequential design, I guess, is avoiding n/w traffic
explosion in a heavy map reduce job.


-Elton

2011/5/8 <stanley....@emc.com>

> To my understanding, the reader read file blocks in parallel.
>
> -----Original Message-----
> From: Hassen Riahi [mailto:hassen.ri...@cern.ch]
> Sent: 2011年5月7日 23:50
> To: hdfs-user@hadoop.apache.org
> Subject: Read files from hdfs
>
> Hi all,
>
> is the read operation of 1 file stored in hdfs done in parallel?
>
> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
> each block is stored in 1 rack.
> When reading this file, both blocks are read in parallel? or the first
> block is read and then once done the read of the second block begins?
> If the later is right, the read of files in hdfs is then sequential.
> is it right or am I missing something?
>
> Thanks,
> Hassen
>
>

Re: Read files from hdfs

Reply via email to