Yes it could get slower cause the operation would now involve a disk read AND a network transfer (with other little overheads it carries along).
2011/5/12 Hassen Riahi <hassen.ri...@cern.ch>: > Thank you Elton and Stanley for your reply. > Given that we are not running map reduce jobs (at least until now) + > assuming that the read is sequential + in case where the network is not > heavily used, I'll wait to see in general a degradation of performance when > reading 1 file from hdfs (hdfs blocks will be read sequentially from > different datanodes) compared to reading it from a usual filesystems (which > store file without splitting it). is it right? > Thanks, > Hassen > > Hassen, > Read in hdfs is sequential, i.e. read one block after another. Each time the > client will connect to one data node to read a block. Then connect to > another (or the same) data node to read next block. > The reason for this sequential design, I guess, is avoiding n/w traffic > explosion in a heavy map reduce job. > -Elton > > 2011/5/8 <stanley....@emc.com> >> >> To my understanding, the reader read file blocks in parallel. >> >> -----Original Message----- >> From: Hassen Riahi [mailto:hassen.ri...@cern.ch] >> Sent: 2011年5月7日 23:50 >> To: hdfs-user@hadoop.apache.org >> Subject: Read files from hdfs >> >> Hi all, >> >> is the read operation of 1 file stored in hdfs done in parallel? >> >> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and >> each block is stored in 1 rack. >> When reading this file, both blocks are read in parallel? or the first >> block is read and then once done the read of the second block begins? >> If the later is right, the read of files in hdfs is then sequential. >> is it right or am I missing something? >> >> Thanks, >> Hassen >> > > > -- Harsh J