Adam,

Thank you for the background on stuffed inodes and resource groups, it is much 
appreciated.

For this specific application most files are under 1k.  A few are larger 
(20-30k) but they are rare and so I think we can accommodate a small 
performance hit for these.  Overall the file system may contain 500,000 or more 
of these small files at a time.

The improvement we measured is a bit more than "modest".  Our benchmark 
finishes about 30% faster with the 1k block size compared to 4k.  That's a nice 
win for a simple change.  Disk bandwidth to/from shared storage might be a 
factor--we have 12 nodes accessing this storage, so the aggregate bandwidth is 
considerable.

It has been suggested to me that NFS would yield more performance gains, but I 
have not attempted this.  RHCS has so far met our expectations of high 
availability.  Given that NFS is not a cluster file system I'm nervous that 
such a setup could introduce new points of failure.  (I realize that NFS could 
be coupled with e.g. DRBD+pacemaker for failover purposes.)

We implemented the typical GFS1 tuneables long ago (noatime, noquota, 
statfs_fast).  Disabling SELinux also helped.  Checking block size was truly an 
afterthought, and we had not given any consideration to resource group size 
either.

I've learned a ton about disk storage by implementing shared storage and 
clustered filesystems over the past 3 years.  Block devices are a bit "magical" 
in general, and widely misunderstood by system administrators and software 
engineers.  (For example, I've heard some fantastic performance claims on ext3 
file systems that turned out to demonstrate how effective Linux is at hiding 
disk latency.)  Thanks again to you and this list for providing continued 
insight.

-Jeff

> -----Original Message-----
> From: linux-cluster-boun...@redhat.com 
> [mailto:linux-cluster-boun...@redhat.com]
> On Behalf Of Adam Drew
> Sent: Tuesday, January 04, 2011 2:18 PM
> To: linux clustering
> Subject: Re: [Linux-cluster] GFS block size
> 
> If your average file size is less than 1K then using a block size of 1k may 
> be a good
> option. If you can fit your data in a single block you get the minor 
> performance boost
> of using a stuffed inode so you never have to walk a list from your inode to 
> your data
> block. The performance boost should be small but could add up to larger gains 
> over
> time with lots of transactions. If your average data payload is less than the 
> default
> block-size however, you'll end up losing the delta. So, from a filesystem 
> perspective,
> using a 1k blocksize to store mostly sub-1k files may be a good idea.
> 
> You additionally may want to experiment with reducing your resource group 
> size.
> Blocks are organized into resource groups. If you are using 1k blocks and 
> sub-1k files
> then you'll end up with tons of stuffed inodes per resource group. Some 
> operations in
> GFS require locking the resource group metadata (such as deletes) so you may 
> start
> to experience performance bottle-necks depending on usage patterns and disk 
> layout.
> 
> All-in-all I'd be skeptical of the claim of large performance gains over time 
> by changing
> rg size and block size but modest gains may be had. Still, some access 
> patterns and
> filesystem layouts may experience greater performance gains with such 
> tweaking.
> However, I would expect to see the most significant gains (in GFS1 at least) 
> made by
> mount options and tuneables.
> 
> Regards,
> Adam Drew
> 
> ----- Original Message -----
> From: "juncheol park" <nuke...@gmail.com>
> To: "linux clustering" <linux-cluster@redhat.com>
> Sent: Tuesday, January 4, 2011 1:42:45 PM
> Subject: Re: [Linux-cluster] GFS block size
> 
> I also experimented 1k block size on GFS1. Although you can improve the disk 
> usage
> using a smaller block size, typically it is recommended to use the block size 
> same as
> the page size, which is 4k in Linux.
> 
> I don't remember all the details of results. However, for large files, the 
> overall
> performance of read/write operations with 1k block size was much worse than 
> the one
> with 4k block size. This is obvious, though. If you don't care any performance
> degradation for large files, it would be fine for you to use 1k.
> 
> Just my two cents,
> 
> -Jun
> 
> 
> On Fri, Dec 17, 2010 at 3:53 PM, Jeff Sturm <jeff.st...@eprize.com> wrote:
> > One of our GFS filesystems tends to have a large number of very small
> > files, on average about 1000 bytes each.
> >
> >
> >
> > I realized this week we'd created our filesystems with default
> > options.  As an experiment on a test system, I've recreated a GFS
> > filesystem with "-b 1024" to reduce overall disk usage and disk bandwidth.
> >
> >
> >
> > Initially, tests look very good—single file creates are less than one
> > millisecond on average (down from about 5ms each).  Before I go very
> > far with this, I wanted to ask:  Has anyone else experimented with the
> > block size option, and are there any tricks or gotchas to report?
> >
> >
> >
> > (This is with CentOS 5.5, GFS 1.)
> >
> >
> >
> > -Jeff
> >
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to