Re: making file system block size bigger to improve hdfs performance ?

2011-10-10 Thread Steve Loughran

On 09/10/11 07:01, M. C. Srivas wrote:


If you insist on HDFS, try using XFS underneath, it does a much better job
than ext3 or ext4 for Hadoop in terms of how data is layed out on disk. But
its memory footprint is alteast twice of that of ext3, so it will gobble up
a lot more memory on your box.


How stable have you found XFS? I know people have worked a lot on ext4 
and I am using it locally, even if something (VirtualBox) tell me off 
for doing so. I know the Lustre people are using underneath their DFS, 
and with wide use it does tend to get debugged by others before you use 
your data.


Re: making file system block size bigger to improve hdfs performance ?

2011-10-10 Thread M. C. Srivas
XFS was created in 1991 by Silicon Graphics.  It was designed for streaming.
The Linux port was in 2002 or so.

I've used it extensively for the past 8 years. It is very stable, and many
NAS companies have embedded it in their products. In particular, it works
well even when the disk starts getting full. ext4 tends to have problems
with multiple streams (it seeks too much), and ext3 has a fragmentation
problem.

(MapR's disk layout is even better compared to XFS  ...  couldn't resist)


On Mon, Oct 10, 2011 at 3:48 AM, Steve Loughran ste...@apache.org wrote:

 On 09/10/11 07:01, M. C. Srivas wrote:

  If you insist on HDFS, try using XFS underneath, it does a much better job
 than ext3 or ext4 for Hadoop in terms of how data is layed out on disk.
 But
 its memory footprint is alteast twice of that of ext3, so it will gobble
 up
 a lot more memory on your box.


 How stable have you found XFS? I know people have worked a lot on ext4 and
 I am using it locally, even if something (VirtualBox) tell me off for doing
 so. I know the Lustre people are using underneath their DFS, and with wide
 use it does tend to get debugged by others before you use your data.



Re: making file system block size bigger to improve hdfs performance ?

2011-10-10 Thread Brian Bockelman
I can provide another data point here: xfs works very well in modern Linuxes 
(in the 2.6.9 era, it had many memory management headaches, especially around 
the switch to 4k stacks), and its advantage is significant when you run file 
systems over 95% occupied.

Brian

On Oct 10, 2011, at 8:51 AM, M. C. Srivas wrote:

 XFS was created in 1991 by Silicon Graphics.  It was designed for streaming.
 The Linux port was in 2002 or so.
 
 I've used it extensively for the past 8 years. It is very stable, and many
 NAS companies have embedded it in their products. In particular, it works
 well even when the disk starts getting full. ext4 tends to have problems
 with multiple streams (it seeks too much), and ext3 has a fragmentation
 problem.
 
 (MapR's disk layout is even better compared to XFS  ...  couldn't resist)
 
 
 On Mon, Oct 10, 2011 at 3:48 AM, Steve Loughran ste...@apache.org wrote:
 
 On 09/10/11 07:01, M. C. Srivas wrote:
 
 If you insist on HDFS, try using XFS underneath, it does a much better job
 than ext3 or ext4 for Hadoop in terms of how data is layed out on disk.
 But
 its memory footprint is alteast twice of that of ext3, so it will gobble
 up
 a lot more memory on your box.
 
 
 How stable have you found XFS? I know people have worked a lot on ext4 and
 I am using it locally, even if something (VirtualBox) tell me off for doing
 so. I know the Lustre people are using underneath their DFS, and with wide
 use it does tend to get debugged by others before you use your data.
 



Re: making file system block size bigger to improve hdfs performance ?

2011-10-03 Thread Niels Basjes
Have you tried it to see what diffrence it makes?

-- 
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 3 okt. 2011 07:06 schreef Jinsong Hu jinsong...@hotmail.com het
volgende:
 Hi, There:
 I just thought an idea. When we format the disk , the block size is
 usually 1K to 4K. For hdfs, the block size is usually 64M.
 I wonder if we change the raw file system's block size to something
 significantly bigger, say, 1M or 8M, will that improve
 disk IO performance for hadoop's hdfs ?
 Currently, I noticed that mapr distribution uses mfs, its own file system.

 That resulted in 4 times performance gain in terms
 of disk IO. I just wonder if we tune the hosting os parameters, we can
 achieve better disk IO performance with just the regular
 apache hadoop distribution.
 I understand that making the block size bigger can result in some disk
 space waste for small files. However, for disk dedicated
 for hdfs, where most of the files are very big, I just wonder if it is a
 good idea. Any body have any comment ?

 Jimmy



making file system block size bigger to improve hdfs performance ?

2011-10-02 Thread Jinsong Hu

Hi, There:
 I just thought an idea. When we format the disk , the block size is 
usually 1K to 4K. For hdfs, the block size is usually 64M.
I wonder if we change the raw file system's block size to something 
significantly bigger, say, 1M or 8M, will that improve

disk IO performance for hadoop's hdfs ?
 Currently, I noticed that mapr distribution uses mfs, its own file system. 
That resulted in 4 times performance gain in terms
of disk IO. I just wonder if we tune the hosting os parameters, we can 
achieve better disk IO performance with just the regular

apache hadoop distribution.
 I understand that making the block size bigger can result in some disk 
space waste for small files. However, for disk dedicated
for hdfs, where most of the files are very big, I just wonder if it is a 
good idea. Any body have any comment ?


Jimmy