Re: [PATCH] allocation looping + kswapd CPU cycles
On Tue, 8 May 2001, David S. Miller wrote: > So instead, you could test for the condition that prevents any > possible forward progress, no? if (!order || free_shortage() > 0) goto try_again; (which was the experimental patch I discussed with Marcelo) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
Hi, On Thu, May 10, 2001 at 03:49:05PM -0300, Marcelo Tosatti wrote: > Back to the main discussion --- I guess we could make __GFP_FAIL (with > __GFP_WAIT set :)) allocations actually fail if "try_to_free_pages()" does > not make any progress (ie returns zero). But maybe thats a bit too > extreme. That would seem to be a reasonable interpretation of __GFP_FAIL + __GFP_WAIT, yes. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Thu, 10 May 2001, Stephen C. Tweedie wrote: > Hi, > > On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote: > > > Initially I thought about __GFP_FAIL to be used by writeout routines which > > want to cluster pages until they can allocate memory without causing any > > pressure to the system. Something like this: > > > > while ((page = alloc_page(GFP_FAIL)) > > add_page_to_cluster(page); > > write_cluster(); > > Isn't that an orthogonal decision? You can use __GFP_FAIL with or > without __GFP_WAIT or __GFP_IO, whichever is appropriate. Correct. Back to the main discussion --- I guess we could make __GFP_FAIL (with __GFP_WAIT set :)) allocations actually fail if "try_to_free_pages()" does not make any progress (ie returns zero). But maybe thats a bit too extreme. What do you think? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
Hi, On Thu, May 10, 2001 at 03:22:57PM -0300, Marcelo Tosatti wrote: > Initially I thought about __GFP_FAIL to be used by writeout routines which > want to cluster pages until they can allocate memory without causing any > pressure to the system. Something like this: > > while ((page = alloc_page(GFP_FAIL)) > add_page_to_cluster(page); > write_cluster(); Isn't that an orthogonal decision? You can use __GFP_FAIL with or without __GFP_WAIT or __GFP_IO, whichever is appropriate. Cheers, Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Thu, 10 May 2001, Stephen C. Tweedie wrote: > Hi, > > On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote: > > > No. __GFP_FAIL can to try to reclaim pages from inactive clean. > > > > We just want to avoid __GFP_FAIL allocations from going to > > try_to_free_pages(). > > Why? __GFP_FAIL is only useful as an indication that the caller has > some magic mechanism for coping with failure. Hum, not _only_. Initially I thought about __GFP_FAIL to be used by writeout routines which want to cluster pages until they can allocate memory without causing any pressure to the system. Something like this: while ((page = alloc_page(GFP_FAIL)) add_page_to_cluster(page); write_cluster(); See? > There's no other information passed, so a brief call to > try_to_free_pages is quite appropriate. This obviously depends on what we decide __GFP_FAIL will be used for. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
Hi, On Thu, May 10, 2001 at 01:43:46PM -0300, Marcelo Tosatti wrote: > No. __GFP_FAIL can to try to reclaim pages from inactive clean. > > We just want to avoid __GFP_FAIL allocations from going to > try_to_free_pages(). Why? __GFP_FAIL is only useful as an indication that the caller has some magic mechanism for coping with failure. There's no other information passed, so a brief call to try_to_free_pages is quite appropriate. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Thu, 10 May 2001, Mark Hemment wrote: > > On Wed, 9 May 2001, Marcelo Tosatti wrote: > > On Wed, 9 May 2001, Mark Hemment wrote: > > > Could introduce another allocation flag (__GFP_FAIL?) which is or'ed > > > with a __GFP_WAIT to limit the looping? > > > > __GFP_FAIL is in the -ac tree already and it is being used by the bounce > > buffer allocation code. > > Thanks for the pointer. > > For non-zero order allocations, the test against __GFP_FAIL is a little > too soon; it would be better after we've tried to reclaim pages from the > inactive-clean list. Any nasty side effects to this? No. __GFP_FAIL can to try to reclaim pages from inactive clean. We just want to avoid __GFP_FAIL allocations from going to try_to_free_pages(). > Plus, the code still prevents PF_MEMALLOC processes from using the > inactive-clean list for non-zero order allocations. As the trend seems to > be to make zero and non-zero allocations 'equivalent', shouldn't this > restriction to lifted? I don't see any problem about making non-zero allocations be able to directly reclaim pages. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Wed, 9 May 2001, Marcelo Tosatti wrote: > On Wed, 9 May 2001, Mark Hemment wrote: > > Could introduce another allocation flag (__GFP_FAIL?) which is or'ed > > with a __GFP_WAIT to limit the looping? > > __GFP_FAIL is in the -ac tree already and it is being used by the bounce > buffer allocation code. Thanks for the pointer. For non-zero order allocations, the test against __GFP_FAIL is a little too soon; it would be better after we've tried to reclaim pages from the inactive-clean list. Any nasty side effects to this? Plus, the code still prevents PF_MEMALLOC processes from using the inactive-clean list for non-zero order allocations. As the trend seems to be to make zero and non-zero allocations 'equivalent', shouldn't this restriction to lifted? Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Wed, 9 May 2001, Mark Hemment wrote: > > On Tue, 8 May 2001, David S. Miller wrote: > > Actually, the change was made because it is illogical to try only > > once on multi-order pages. Especially because we depend upon order > > 1 pages so much (every task struct allocated). We depend upon them > > even more so on sparc64 (certain kinds of page tables need to be > > allocated as 1 order pages). > > > > The old code failed _far_ too easily, it was unacceptable. > > > > Why put some strange limit in there? Whatever number you pick > > is arbitrary, and I can probably piece together an allocation > > state where the choosen limit is too small. > > Agreed, but some allocations of non-zero orders can fall back to other > schemes (such as an emergency buffer, or using vmalloc for a temp > buffer) and don't want to be trapped in __alloc_pages() for too long. > > Could introduce another allocation flag (__GFP_FAIL?) which is or'ed > with a __GFP_WAIT to limit the looping? __GFP_FAIL is in the -ac tree already and it is being used by the bounce buffer allocation code. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Tue, 8 May 2001, David S. Miller wrote: > Actually, the change was made because it is illogical to try only > once on multi-order pages. Especially because we depend upon order > 1 pages so much (every task struct allocated). We depend upon them > even more so on sparc64 (certain kinds of page tables need to be > allocated as 1 order pages). > > The old code failed _far_ too easily, it was unacceptable. > > Why put some strange limit in there? Whatever number you pick > is arbitrary, and I can probably piece together an allocation > state where the choosen limit is too small. Agreed, but some allocations of non-zero orders can fall back to other schemes (such as an emergency buffer, or using vmalloc for a temp buffer) and don't want to be trapped in __alloc_pages() for too long. Could introduce another allocation flag (__GFP_FAIL?) which is or'ed with a __GFP_WAIT to limit the looping? > So instead, you could test for the condition that prevents any > possible forward progress, no? Yes, it is possible to trap when kswapd might not make any useful progress for a failing non-zero ordered allocation, and to set a global "force" flag (kswapd_force) to ensure it does something useful. For order-1 allocations, that would work. For order-2 (and above) it becomes much more difficult as the page 'reap' routines release/process pages based upon age and do not factor in whether a page may/will buddy (now or in the near future). This 'blind' processing of pages can wipe a significant percentage of the page cache when trying to build a buddy at a high order. Of course, no one should be doing really large order allocations and expecting them to succeed. But, if they are doing this, the allocation should at least fail. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
Marcelo Tosatti writes: > On Tue, 8 May 2001, Mark Hemment wrote: > > Does anyone know why the 2.4.3pre6 change was made? > > Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have > any job to do (ie less than 30% dirty buffers in the default config). Actually, the change was made because it is illogical to try only once on multi-order pages. Especially because we depend upon order 1 pages so much (every task struct allocated). We depend upon them even more so on sparc64 (certain kinds of page tables need to be allocated as 1 order pages). The old code failed _far_ too easily, it was unacceptable. Why put some strange limit in there? Whatever number you pick is arbitrary, and I can probably piece together an allocation state where the choosen limit is too small. So instead, you could test for the condition that prevents any possible forward progress, no? Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Tue, May 08 2001, Marcelo Tosatti wrote: > > The attached patch (against 2.4.5-pre1) fixes the looping symptom, by > > adding a counter and looping only twice for non-zero order allocations. > > Looks good. (actually Rik had a patch similar to this which fixed a real > case with cdda2wav just like you described) Not cdda2wav, I pressume, but the optimization discussed here before that wasn't really doable because of the vm behaviour when doing do try to alloc some amount of contiogous pages if (ok) break lower number of pages wanted while true CDROMREADAUDIO stopped doing this and fell back to single cdda frame size allocations because of these failures, even though it meant a huge decrease in speed. cdda2wav will ask for iirc 16 frames at the time, the current driver will try and to 8 first and then fall back to slower extraction if allocations fail. -- Jens Axboe - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
On Tue, 8 May 2001, Mark Hemment wrote: > > In 2.4.3pre6, code in page_alloc.c:__alloc_pages(), changed from; > > try_to_free_pages(gfp_mask); > wakeup_bdflush(); > if (!order) > goto try_again; > to > try_to_free_pages(gfp_mask); > wakeup_bdflush(); > goto try_again; > > > This introduced the effect of a non-zero order, __GFP_WAIT allocation > (without PF_MEMALLOC set), never returning failure. The allocation keeps > looping in __alloc_pages(), kicking kswapd, until the allocation succeeds. > > If there is plenty of memory in the free-pools and inactive-lists > free_shortage() will return false, causing the state of these > free-pools/inactive-lists not to be 'improved' by kswapd. > > If there is nothing else changing/improving the free-pools or > inactive-lists, the allocation loops forever (kicking kswapd). > > Does anyone know why the 2.4.3pre6 change was made? Because wakeup_bdflush(0) can wakeup bdflush _even_ if it does not have any job to do (ie less than 30% dirty buffers in the default config). > > The attached patch (against 2.4.5-pre1) fixes the looping symptom, by > adding a counter and looping only twice for non-zero order allocations. Looks good. (actually Rik had a patch similar to this which fixed a real case with cdda2wav just like you described) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] allocation looping + kswapd CPU cycles
> The real fix is to measure fragmentation and the progress of kswapd, but > that is too drastic for 2.4.x. I suspect the real fix might, in general, be a) to reduce use of kmalloc() etc. which gives physically contiguous memory, where virtually contiguous memory will do (and is, presumably, far easier to come by). (or perhaps add some flag to kmalloc to allocate out of virtual rather than physical memory). b) to bias flush or swap out routines to create physically contiguous higher order blocks. Many heuristics will give you that ability. Disclaimer: I haven't looked at this for issue for years, but Linux seems to fail on >4k allocations now, and fragment memory far more, than it did on much smaller systems doing lots of nasty (8k, thus 3 pages including header) NFS stuff back in 94. -- Alex Bligh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/