----- Original Message ----- > Hi, > > > On 09/06/15 15:45, Bob Peterson wrote: > > ----- Original Message ----- > >> Hi, > >> > >> > >> On 05/06/15 15:49, Bob Peterson wrote: > >>> Hi, > >>> > >>> This patch allows the block allocation code to retain the buffers > >>> for the resource groups so they don't need to be re-read from buffer > >>> cache with every request. This is a performance improvement that's > >>> especially noticeable when resource groups are very large. For > >>> example, with 2GB resource groups and 4K blocks, there can be 33 > >>> blocks for every resource group. This patch allows those 33 buffers > >>> to be kept around and not read in and thrown away with every > >>> operation. The buffers are released when the resource group is > >>> either synced or invalidated. > >> The blocks should be cached between operations, so this should only be > >> resulting in a skip of the look up of the cached block, and no changes > >> to the actual I/O. Does that mean that grab_cache_page() is slow I > >> wonder? Or is this an issue of going around the retry loop due to lack > >> of memory at some stage? > >> > >> How does this interact with the rgrplvb support? I'd guess that with > >> that turned on, this is no longer an issue, because we'd only read in > >> the blocks for the rgrps that we are actually going to use? > >> > >> > >> > >> Steve. > > Hi, > > > > If you compare the two vmstat outputs in the bugzilla #1154782, you'll > > see no significant difference in memory usage nor cpu usage. So I assume > > the page lookup is the "slow" part; not because it's such a slow thing > > but because it's done 33 times per read-reference-invalidate (33 pages > > to look up per rgrp). > > > > Regards, > > > > Bob Peterson > > Red Hat File Systems > > Thats true, however, as I understand the problem here, the issue is not > reading in the blocks for the rgrp that is eventually selected to use, > but the reading in of those blocks for the rgrps that we reject, for > whatever reason (full, or congested, or whatever). So with rgrplvb > enabled, we don't then read those rgrps in off disk at all in most cases > - so I was wondering whether that solves the problem without needing > this change? > > Ideally I'd like to make the rgrplvb setting the default, since it is > much more efficient. The question is how we can do that and still remain > backward compatible? Not an easy one to answer :( > > Also, if the page lookup is the slow thing, then we should look at using > pagevec_lookup() to get the pages in chunks rather than doing it > individually (and indeed, multiple times per page, in case of block size > less than page size). We know that the blocks will always be contiguous > on disk, so we should be able to send down large I/Os, rather than > relying on the block stack to merge them as we do at the moment, which > should be a further improvement too, > > Steve.
Hi, The rgrplvb mount option only helps if the file system is using lock_dlm. For lock_nolock, it's still just as slow because lock_nolock has no knowledge of lvbs. Now, granted, that's an unusual case because GFS2 is normally used with lock_dlm. I like the idea of making rgrplvb the default mount option, and I don't see a problem doing that. I think the rgrplvb option should be compatible with this patch, but I'll set up a test environment in order to test that they work together harmoniously. I also like the idea of using a pagevec for reading in multiple pages for the rgrps, but that's another improvement for another day. If there's not a bugzilla record open for that, perhaps we should open one. Regards, Bob Peterson Red Hat File Systems