Thank you Elton and Stanley for your reply.
Given that we are not running map reduce jobs (at least until now) +
assuming that the read is sequential + in case where the network is
not heavily used, I'll wait to see in general a degradation of
performance when reading 1 file from hdfs (hdfs blocks will be read
sequentially from different datanodes) compared to reading it from a
usual filesystems (which store file without splitting it). is it right?
Thanks,
Hassen
Hassen,
Read in hdfs is sequential, i.e. read one block after another. Each
time the client will connect to one data node to read a block. Then
connect to another (or the same) data node to read next block.
The reason for this sequential design, I guess, is avoiding n/w
traffic explosion in a heavy map reduce job.
-Elton
2011/5/8 <stanley....@emc.com>
To my understanding, the reader read file blocks in parallel.
-----Original Message-----
From: Hassen Riahi [mailto:hassen.ri...@cern.ch]
Sent: 2011年5月7日 23:50
To: hdfs-user@hadoop.apache.org
Subject: Read files from hdfs
Hi all,
is the read operation of 1 file stored in hdfs done in parallel?
I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
each block is stored in 1 rack.
When reading this file, both blocks are read in parallel? or the first
block is read and then once done the read of the second block begins?
If the later is right, the read of files in hdfs is then sequential.
is it right or am I missing something?
Thanks,
Hassen