On Thu, 15 Oct 2009 11:32:35 +0200
Usman Waheed <usm...@opera.com> wrote:

> Hi Todd,
> 
> Some changes have been applied to the cluster based on the
> documentation (URL) you noted below,

I would also like to know what settings people are tuning on the
operating system level. The blog post mentioned here does not mention
much about that, except for the fileno changes.

We got about 3x the read performance when running DFSIOTest by mounting
our ext3 filesystems with the noatime parameter. I saw that mentioned
in the slides from some Cloudera presentation.

(For those who don't know, the noatime parameter turns off the
recording of access time on files. That's a horrible performance killer
since it means every read of a file also means that the kernel must do
a write. These writes are probably queued up, but still, if you don't
need the atime (very few applications do), turn it off!)

Have people been experimenting with different filesystems, or are most
of us running on top of ext3? 

How about mounting ext3 with "data=writeback"? That's rumoured to give
the best throughput and could help with write performance. From
mount(8):

     writeback
            Data ordering is not preserved - data may be written into the main 
file system 
            after its metadata has been  committed  to the journal.  This is 
rumoured to be the
            highest throughput option.  It guarantees internal file system 
integrity,  
            however it can allow old data to appear in files after a crash and 
journal recovery.

How would the HDFS consistency checks cope with old data appearing in
the unerlying files after a system crash?

Cheers,
\EF
-- 
Erik Forsberg <forsb...@opera.com>
Developer, Opera Software - http://www.opera.com/

Reply via email to