Hassen, Read in hdfs is sequential, i.e. read one block after another. Each time the client will connect to one data node to read a block. Then connect to another (or the same) data node to read next block. The reason for this sequential design, I guess, is avoiding n/w traffic explosion in a heavy map reduce job.
-Elton 2011/5/8 <stanley....@emc.com> > To my understanding, the reader read file blocks in parallel. > > -----Original Message----- > From: Hassen Riahi [mailto:hassen.ri...@cern.ch] > Sent: 2011年5月7日 23:50 > To: hdfs-user@hadoop.apache.org > Subject: Read files from hdfs > > Hi all, > > is the read operation of 1 file stored in hdfs done in parallel? > > I mean let's say that I have 1 file split in 2 blocks (hdfs block) and > each block is stored in 1 rack. > When reading this file, both blocks are read in parallel? or the first > block is read and then once done the read of the second block begins? > If the later is right, the read of files in hdfs is then sequential. > is it right or am I missing something? > > Thanks, > Hassen > >