Re: Read files from hdfs

Harsh J Wed, 11 May 2011 22:36:43 -0700

Yes it could get slower cause the operation would now involve a disk
read AND a network transfer (with other little overheads it carries
along).


2011/5/12 Hassen Riahi <hassen.ri...@cern.ch>:
> Thank you Elton and Stanley for your reply.
> Given that we are not running map reduce jobs (at least until now) +
> assuming that the read is sequential + in case where the network is not
> heavily used, I'll wait to see in general a degradation of performance when
> reading 1 file from hdfs (hdfs blocks will be read sequentially from
> different datanodes) compared to reading it from a usual filesystems (which
> store file without splitting it). is it right?
> Thanks,
> Hassen
>
> Hassen,
> Read in hdfs is sequential, i.e. read one block after another. Each time the
> client will connect to one data node to read a block. Then connect to
> another (or the same) data node to read next block.
> The reason for this sequential design, I guess, is avoiding n/w traffic
> explosion in a heavy map reduce job.
> -Elton
>
> 2011/5/8 <stanley....@emc.com>
>>
>> To my understanding, the reader read file blocks in parallel.
>>
>> -----Original Message-----
>> From: Hassen Riahi [mailto:hassen.ri...@cern.ch]
>> Sent: 2011年5月7日 23:50
>> To: hdfs-user@hadoop.apache.org
>> Subject: Read files from hdfs
>>
>> Hi all,
>>
>> is the read operation of 1 file stored in hdfs done in parallel?
>>
>> I mean let's say that I have 1 file split in 2 blocks (hdfs block) and
>> each block is stored in 1 rack.
>> When reading this file, both blocks are read in parallel? or the first
>> block is read and then once done the read of the second block begins?
>> If the later is right, the read of files in hdfs is then sequential.
>> is it right or am I missing something?
>>
>> Thanks,
>> Hassen
>>
>
>
>



-- 
Harsh J

Re: Read files from hdfs

Reply via email to