Hey guys,

During the SC09 exercise, our data transfer tool was using the FUSE interface to HDFS. As Brian said, we were also reading 16 files in parallel. This seemed to be the optimal number, beyond which the aggregate read rate did not improve.

We have worked scheduled to modify our data transfer tool to use the native hadoop java APIs, as well as running some additional tests offline to see if the HDFS-FUSE interface is the bottleneck as we suspect.

Regards,

--Mike

On 11/24/2009 03:01 PM, Brian Bockelman wrote:
Hey Raghu,

There are a few performance issues.  Last week during Supercomputing '09, 
Caltech was having issues with getting more than 2.6 Gbps per HDFS client 
process (I think they were pulling 16 files per process, but Mike knows the 
details).  I think they'd appreciate any advice you have about tuning HDFS 
performance.

We're starting early R&D for 100Gbps dataflows, and I believe improving our 
current HDFS performance is on the TODO list.

Brian

(PS - I'm not saying HDFS is at fault here - it always remains a possibility 
that we're using it in a sub-optimal manner.  If you have any favorite Java 
performance instrumentation to recommend, we'd also be interested in that.)

On Nov 24, 2009, at 12:35 PM, Raghu Angadi wrote:

Sequential read is the simplest case and it is pretty hard to improve upon
the current raw performance (HDFS client does take more CPU than one might
expect, Todd implemented an improvement for CPU consumed).

Just to reiterate what Todd said, there is an implicit read ahead for
sequential reads with TCP buffers and kernel read ahead on Datanodes.

If you extend the read ahead buffer to be more of a buffer cache for the
block, it could have big impact for some read access patterns (e.g. binary
search).

Raghu.

On Mon, Nov 23, 2009 at 11:23 PM, Martin Mituzas<xietao1...@hotmail.com>wrote:


I read the code and find the call
DFSInputStream.read(buf, off, len)
will cause the DataNode read len bytes (or less if encounting the end of
block) , why does not hdfs read ahead to improve performance for sequential
read?
--
View this message in context:
http://old.nabble.com/why-does-not-hdfs-read-ahead---tp26491449p26491449.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.





Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to