Re: Read files from hdfs

Hassen Riahi Wed, 11 May 2011 14:58:48 -0700

Thank you Elton and Stanley for your reply.

Given that we are not running map reduce jobs (at least until now) +assuming that the read is sequential + in case where the network isnot heavily used, I'll wait to see in general a degradation ofperformance when reading 1 file from hdfs (hdfs blocks will be readsequentially from different datanodes) compared to reading it from ausual filesystems (which store file without splitting it). is it right?


Thanks,
Hassen

Hassen,

Read in hdfs is sequential, i.e. read one block after another. Eachtime the client will connect to one data node to read a block. Thenconnect to another (or the same) data node to read next block.The reason for this sequential design, I guess, is avoiding n/wtraffic explosion in a heavy map reduce job.

-Elton

2011/5/8 <stanley....@emc.com>
To my understanding, the reader read file blocks in parallel.

-----Original Message-----
From: Hassen Riahi [mailto:hassen.ri...@cern.ch]
Sent: 2011年5月7日 23:50
To: hdfs-user@hadoop.apache.org
Subject: Read files from hdfs

Hi all,

is the read operation of 1 file stored in hdfs done in parallel?

I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
each block is stored in 1 rack.
When reading this file, both blocks are read in parallel? or the first
block is read and then once done the read of the second block begins?
If the later is right, the read of files in hdfs is then sequential.
is it right or am I missing something?

Thanks,
Hassen

Re: Read files from hdfs

Reply via email to