On Tuesday 16 October 2007 14:57, Eric W. Biederman wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > >> make_page_uptodate() is most hideous part I have run into. > >> It has to know details about other layers to now what not > >> to stomp. I think my incorrect simplification of this is what messed > >> things up, last round. > > > > Not really, it's just named funny. That's just a minor utility > > function that more or less does what it says it should do. > > > > The main problem is really that it's implementing a block device > > who's data comes from its own buffercache :P. I think. > > Well to put it another way, mark_page_uptodate() is the only > place where we really need to know about the upper layers. > Given that you can kill ramdisks by coding it as: > > static void make_page_uptodate(struct page *page) > { > clear_highpage(page); > flush_dcache_page(page); > SetPageUptodate(page); > } > > Something is seriously non-intuitive about that function if > you understand the usual rules for how to use the page cache.
You're overwriting some buffers that were uptodate and dirty. That would be expected to cause problems. > The problem is that we support a case in the buffer cache > where pages are partially uptodate and only the buffer_heads > remember which parts are valid. Assuming we are using them > correctly. > > Having to walk through all of the buffer heads in make_page_uptodate > seems to me to be a nasty layering violation in rd.c Sure, but it's not just about the buffers. It's the pagecache in general. It is supposed to be invisible to the device driver and sitting above it, and yet it is taking the buffercache and using it to pull its data out of. > > I think it's worthwhile, given that we'd have a "real" looking > > block device and minus these bugs. > > For testing purposes I think I can agree with that. What non-testing uses does it have? > >> Having a separate store would > >> solve some of the problems, and probably remove the need > >> for carefully specifying the ramdisk block size. We would > >> still need the magic restictions on page allocations though > >> and it we would use them more often as the initial write to the > >> ramdisk would not populate the pages we need. > > > > What magic restrictions on page allocations? Actually we have > > fewer restrictions on page allocations because we can use > > highmem! > > With the proposed rewrite yes. > > > And the lowmem buffercache pages that we currently pin > > (unsuccessfully, in the case of this bug) are now completely > > reclaimable. And all your buffer heads are now reclaimable. > > Hmm. Good point. So in net it should save memory even if > it consumes a little more in the worst case. Highmem systems would definitely like it. For others, yes, all the duplicated pages should be able to get reclaimed if memory gets tight, along with the buffer heads, so yeah footprint may be a tad smaller. > > If you mean GFP_NOIO... I don't see any problem. Block device > > drivers have to allocate memory with GFP_NOIO; this may have > > been considered magic or deep badness back when the code was > > written, but it's pretty simple and accepted now. > > Well I always figured it was a bit rude allocating large amounts > of memory GFP_NOIO but whatever. You'd rather not, of course, but with dirty data limits now, it doesn't matter much. (and I doubt anybody outside testing is going to be hammering like crazy on rd). Note that the buffercache based ramdisk driver is going to also be allocating with GFP_NOFS if you're talking about a filesystem writing to its metadata. In most systems, GFP_NOFS isn't much different to GFP_NOIO. We could introduce a mode which allocates pages up front quite easily if it were a problem (which I doubt it ever would be). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/