Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 29 Sep 2007 06:19:33 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > On Saturday 29 September 2007 19:27, Andrew Morton wrote: > > On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra <[EMAIL PROTECTED]> > wrote: > > > > oom-killings, or page allocation failures? The latter, one hopes. > > > > > > Linux version 2.6.23-rc4-mm1-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 > > > (Ubuntu > > > 4.1.2-0ubuntu4)) #27 Tue Sep 18 15:40:35 CEST 2007 > > > > > > ... > > > > > > > > > mm_tester invoked oom-killer: gfp_mask=0x40d0, order=2, oomkilladj=0 > > > Call Trace: > > > 611b3878: [<6002dd28>] printk_ratelimit+0x15/0x17 > > > 611b3888: [<60052ed4>] out_of_memory+0x80/0x100 > > > 611b38c8: [<60054b0c>] __alloc_pages+0x1ed/0x280 > > > 611b3948: [<6006c608>] allocate_slab+0x5b/0xb0 > > > 611b3968: [<6006c705>] new_slab+0x7e/0x183 > > > 611b39a8: [<6006cbae>] __slab_alloc+0xc9/0x14b > > > 611b39b0: [<6011f89f>] radix_tree_preload+0x70/0xbf > > > 611b39b8: [<600980f2>] do_mpage_readpage+0x3b3/0x472 > > > 611b39e0: [<6011f89f>] radix_tree_preload+0x70/0xbf > > > 611b39f8: [<6006cc81>] kmem_cache_alloc+0x51/0x98 > > > 611b3a38: [<6011f89f>] radix_tree_preload+0x70/0xbf > > > 611b3a58: [<6004f8e2>] add_to_page_cache+0x22/0xf7 > > > 611b3a98: [<6004f9c6>] add_to_page_cache_lru+0xf/0x24 > > > 611b3ab8: [<6009821e>] mpage_readpages+0x6d/0x109 > > > 611b3ac0: [<600d59f0>] ext3_get_block+0x0/0xf2 > > > 611b3b08: [<6005483d>] get_page_from_freelist+0x8d/0xc1 > > > 611b3b88: [<600d6937>] ext3_readpages+0x18/0x1a > > > 611b3b98: [<60056f00>] read_pages+0x37/0x9b > > > 611b3bd8: [<60057064>] __do_page_cache_readahead+0x100/0x157 > > > 611b3c48: [<60057196>] do_page_cache_readahead+0x52/0x5f > > > 611b3c78: [<60050ab4>] filemap_fault+0x145/0x278 > > > 611b3ca8: [<60022b61>] run_syscall_stub+0xd1/0xdd > > > 611b3ce8: [<6005eae3>] __do_fault+0x7e/0x3ca > > > 611b3d68: [<6005ee60>] do_linear_fault+0x31/0x33 > > > 611b3d88: [<6005f149>] handle_mm_fault+0x14e/0x246 > > > 611b3da8: [<60120a7b>] __up_read+0x73/0x7b > > > 611b3de8: [<60013177>] handle_page_fault+0x11f/0x23b > > > 611b3e48: [<60013419>] segv+0xac/0x297 > > > 611b3f28: [<60013367>] segv_handler+0x68/0x6e > > > 611b3f48: [<600232ad>] get_skas_faultinfo+0x9c/0xa1 > > > 611b3f68: [<60023853>] userspace+0x13a/0x19d > > > 611b3fc8: [<60010d58>] fork_handler+0x86/0x8d > > > > OK, that's different. Someone broke the vm - order-2 GFP_KERNEL > > allocations aren't supposed to fail. > > > > I'm suspecting that did_some_progress thing. > > The allocation didn't fail -- it invoked the OOM killer because the kernel > ran out of unfragmented memory. We can't "run out of unfragmented memory" for an order-2 GFP_KERNEL allocation in this workload. We go and synchronously free stuff up to make it work. How did this get broken? > Probably because higher order > allocations are the new vogue in -mm at the moment ;) That's a different bug. bug 1: We shouldn't be doing higher-order allocations in slub because of the considerable damage this does to atomic allocations. bug 2: order-2 GFP_KERNEL allocations shouldn't fail like this. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Versioning file system
Interesting that you mention the multitude of file systems because I was very surprised to see NILFS being promoted in the latest Linux Magazine but no mention of the other more important file systems currently in work like UnionFS ChunkFS or ext4 so publisized. I can say I was disapointed of the article. I still didn't see any real prove that NILFS is the best file system since bread. Neither I see any comments on nilfs from Andrew and others and yet this is the best new file system coming to Linux. Maybe I missed something that happened in Ottawa. /Sorin On Mon, 18 Jun 2007 05:45:24 -0400, Andreas Dilger <[EMAIL PROTECTED]> wrote: On Jun 16, 2007 16:53 +0200, Jörn Engel wrote: On Fri, 15 June 2007 15:51:07 -0700, alan wrote: > >Thus, in the end it turns out that this stuff is better handled by > >explicit version-control systems (which require explicit operations to > >manage revisions) and atomic snapshots (for backup.) > > ZFS is the cool new thing in that space. Too bad the license makes it > hard to incorporate it into the kernel. It may be the coolest, but there are others as well. Btrfs looks good, nilfs finally has a cleaner and may be worth a try, logfs will get snapshots sooner or later. Heck, even my crusty old cowlinks can be viewed as snapshots. If one has spare cycles to waste, working on one of those makes more sense than implementing file versioning. Too bad everyone is spending time on 10 similar-but-slightly-different filesystems. This will likely end up with a bunch of filesystems that implement some easy subset of features, but will not get polished for users or have a full set of features implemented (e.g. ACL, quota, fsck, etc). While I don't think there is a single answer to every question, it does seem that the number of filesystem projects has climbed lately. Maybe there should be a BOF at OLS to merge these filesystem projects (btrfs, chunkfs, tilefs, logfs, etc) into a single project with multiple people working on getting it solid, scalable (parallel readers/writers on lots of CPUs), robust (checksums, failure localization), recoverable, etc. I thought Val's FS summits were designed to get developers to collaborate, but it seems everyone has gone back to their corners to work on their own filesystem? Working on getting hooks into DM/MD so that the filesystem and RAID layers can move beyond "ignorance is bliss" when talking to each other would be great. Not rebuilding empty parts of the fs, limit parity resync to parts of the fs that were in the previous transaction, use fs-supplied checksums to verify on-disk data is correct, use RAID geometry when doing allocations, etc. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards Sorin Faibish Senior Technologist Senior Consulting Software Engineer Network Storage Group EMC² where information lives Phone: 508-435-1000 x 48545 Cellphone: 617-510-0422 Email : [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Saturday 29 September 2007 04:41, Christoph Lameter wrote: > On Fri, 28 Sep 2007, Peter Zijlstra wrote: > > memory got massively fragemented, as anti-frag gets easily defeated. > > setting min_free_kbytes to 12M does seem to solve it - it forces 2 max > > order blocks to stay available, so we don't mix types. however 12M on > > 128M is rather a lot. > > Yes, strict ordering would be much better. On NUMA it may be possible to > completely forbid merging. We can fall back to other nodes if necessary. > 12M is not much on a NUMA system. > > But this shows that (unsurprisingly) we may have issues on systems with a > small amounts of memory and we may not want to use higher orders on such > systems. > > The case you got may be good to use as a testcase for the virtual > fallback. H... Maybe it is possible to allocate the stack as a virtual > compound page. Got some script/code to produce that problem? Yeah, you could do that, but we generally don't have big problems allocating stacks in mainline, because we have very few users of higher order pages, the few that are there don't seem to be a problem. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Saturday 29 September 2007 19:27, Andrew Morton wrote: > On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > oom-killings, or page allocation failures? The latter, one hopes. > > > > Linux version 2.6.23-rc4-mm1-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 > > (Ubuntu > > 4.1.2-0ubuntu4)) #27 Tue Sep 18 15:40:35 CEST 2007 > > > > ... > > > > > > mm_tester invoked oom-killer: gfp_mask=0x40d0, order=2, oomkilladj=0 > > Call Trace: > > 611b3878: [<6002dd28>] printk_ratelimit+0x15/0x17 > > 611b3888: [<60052ed4>] out_of_memory+0x80/0x100 > > 611b38c8: [<60054b0c>] __alloc_pages+0x1ed/0x280 > > 611b3948: [<6006c608>] allocate_slab+0x5b/0xb0 > > 611b3968: [<6006c705>] new_slab+0x7e/0x183 > > 611b39a8: [<6006cbae>] __slab_alloc+0xc9/0x14b > > 611b39b0: [<6011f89f>] radix_tree_preload+0x70/0xbf > > 611b39b8: [<600980f2>] do_mpage_readpage+0x3b3/0x472 > > 611b39e0: [<6011f89f>] radix_tree_preload+0x70/0xbf > > 611b39f8: [<6006cc81>] kmem_cache_alloc+0x51/0x98 > > 611b3a38: [<6011f89f>] radix_tree_preload+0x70/0xbf > > 611b3a58: [<6004f8e2>] add_to_page_cache+0x22/0xf7 > > 611b3a98: [<6004f9c6>] add_to_page_cache_lru+0xf/0x24 > > 611b3ab8: [<6009821e>] mpage_readpages+0x6d/0x109 > > 611b3ac0: [<600d59f0>] ext3_get_block+0x0/0xf2 > > 611b3b08: [<6005483d>] get_page_from_freelist+0x8d/0xc1 > > 611b3b88: [<600d6937>] ext3_readpages+0x18/0x1a > > 611b3b98: [<60056f00>] read_pages+0x37/0x9b > > 611b3bd8: [<60057064>] __do_page_cache_readahead+0x100/0x157 > > 611b3c48: [<60057196>] do_page_cache_readahead+0x52/0x5f > > 611b3c78: [<60050ab4>] filemap_fault+0x145/0x278 > > 611b3ca8: [<60022b61>] run_syscall_stub+0xd1/0xdd > > 611b3ce8: [<6005eae3>] __do_fault+0x7e/0x3ca > > 611b3d68: [<6005ee60>] do_linear_fault+0x31/0x33 > > 611b3d88: [<6005f149>] handle_mm_fault+0x14e/0x246 > > 611b3da8: [<60120a7b>] __up_read+0x73/0x7b > > 611b3de8: [<60013177>] handle_page_fault+0x11f/0x23b > > 611b3e48: [<60013419>] segv+0xac/0x297 > > 611b3f28: [<60013367>] segv_handler+0x68/0x6e > > 611b3f48: [<600232ad>] get_skas_faultinfo+0x9c/0xa1 > > 611b3f68: [<60023853>] userspace+0x13a/0x19d > > 611b3fc8: [<60010d58>] fork_handler+0x86/0x8d > > OK, that's different. Someone broke the vm - order-2 GFP_KERNEL > allocations aren't supposed to fail. > > I'm suspecting that did_some_progress thing. The allocation didn't fail -- it invoked the OOM killer because the kernel ran out of unfragmented memory. Probably because higher order allocations are the new vogue in -mm at the moment ;) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > oom-killings, or page allocation failures? The latter, one hopes. > > > Linux version 2.6.23-rc4-mm1-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 > (Ubuntu 4.1.2-0ubuntu4)) #27 Tue Sep 18 15:40:35 CEST 2007 > > ... > > > mm_tester invoked oom-killer: gfp_mask=0x40d0, order=2, oomkilladj=0 > Call Trace: > 611b3878: [<6002dd28>] printk_ratelimit+0x15/0x17 > 611b3888: [<60052ed4>] out_of_memory+0x80/0x100 > 611b38c8: [<60054b0c>] __alloc_pages+0x1ed/0x280 > 611b3948: [<6006c608>] allocate_slab+0x5b/0xb0 > 611b3968: [<6006c705>] new_slab+0x7e/0x183 > 611b39a8: [<6006cbae>] __slab_alloc+0xc9/0x14b > 611b39b0: [<6011f89f>] radix_tree_preload+0x70/0xbf > 611b39b8: [<600980f2>] do_mpage_readpage+0x3b3/0x472 > 611b39e0: [<6011f89f>] radix_tree_preload+0x70/0xbf > 611b39f8: [<6006cc81>] kmem_cache_alloc+0x51/0x98 > 611b3a38: [<6011f89f>] radix_tree_preload+0x70/0xbf > 611b3a58: [<6004f8e2>] add_to_page_cache+0x22/0xf7 > 611b3a98: [<6004f9c6>] add_to_page_cache_lru+0xf/0x24 > 611b3ab8: [<6009821e>] mpage_readpages+0x6d/0x109 > 611b3ac0: [<600d59f0>] ext3_get_block+0x0/0xf2 > 611b3b08: [<6005483d>] get_page_from_freelist+0x8d/0xc1 > 611b3b88: [<600d6937>] ext3_readpages+0x18/0x1a > 611b3b98: [<60056f00>] read_pages+0x37/0x9b > 611b3bd8: [<60057064>] __do_page_cache_readahead+0x100/0x157 > 611b3c48: [<60057196>] do_page_cache_readahead+0x52/0x5f > 611b3c78: [<60050ab4>] filemap_fault+0x145/0x278 > 611b3ca8: [<60022b61>] run_syscall_stub+0xd1/0xdd > 611b3ce8: [<6005eae3>] __do_fault+0x7e/0x3ca > 611b3d68: [<6005ee60>] do_linear_fault+0x31/0x33 > 611b3d88: [<6005f149>] handle_mm_fault+0x14e/0x246 > 611b3da8: [<60120a7b>] __up_read+0x73/0x7b > 611b3de8: [<60013177>] handle_page_fault+0x11f/0x23b > 611b3e48: [<60013419>] segv+0xac/0x297 > 611b3f28: [<60013367>] segv_handler+0x68/0x6e > 611b3f48: [<600232ad>] get_skas_faultinfo+0x9c/0xa1 > 611b3f68: [<60023853>] userspace+0x13a/0x19d > 611b3fc8: [<60010d58>] fork_handler+0x86/0x8d OK, that's different. Someone broke the vm - order-2 GFP_KERNEL allocations aren't supposed to fail. I'm suspecting that did_some_progress thing. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 2007-09-29 at 02:01 -0700, Andrew Morton wrote: > On Sat, 29 Sep 2007 10:53:41 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > > On Sat, 2007-09-29 at 10:47 +0200, Peter Zijlstra wrote: > > > > > Ah, right, that was the detail... all this lumpy reclaim is useless for > > > atomic allocations. And with SLUB using higher order pages, atomic !0 > > > order allocations will be very very common. > > > > > > One I can remember was: > > > > > > add_to_page_cache() > > > radix_tree_insert() > > > radix_tree_node_alloc() > > > kmem_cache_alloc() > > > > > > which is an atomic callsite. > > > > > > Which leaves us in a situation where we can load pages, because there is > > > free memory, but can't manage to allocate memory to track them.. > > > > Ah, I found a boot log of one of these sessions, its also full of > > order-2 OOMs.. :-/ > > oom-killings, or page allocation failures? The latter, one hopes. Linux version 2.6.23-rc4-mm1-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #27 Tue Sep 18 15:40:35 CEST 2007 ... mm_tester invoked oom-killer: gfp_mask=0x40d0, order=2, oomkilladj=0 Call Trace: 611b3878: [<6002dd28>] printk_ratelimit+0x15/0x17 611b3888: [<60052ed4>] out_of_memory+0x80/0x100 611b38c8: [<60054b0c>] __alloc_pages+0x1ed/0x280 611b3948: [<6006c608>] allocate_slab+0x5b/0xb0 611b3968: [<6006c705>] new_slab+0x7e/0x183 611b39a8: [<6006cbae>] __slab_alloc+0xc9/0x14b 611b39b0: [<6011f89f>] radix_tree_preload+0x70/0xbf 611b39b8: [<600980f2>] do_mpage_readpage+0x3b3/0x472 611b39e0: [<6011f89f>] radix_tree_preload+0x70/0xbf 611b39f8: [<6006cc81>] kmem_cache_alloc+0x51/0x98 611b3a38: [<6011f89f>] radix_tree_preload+0x70/0xbf 611b3a58: [<6004f8e2>] add_to_page_cache+0x22/0xf7 611b3a98: [<6004f9c6>] add_to_page_cache_lru+0xf/0x24 611b3ab8: [<6009821e>] mpage_readpages+0x6d/0x109 611b3ac0: [<600d59f0>] ext3_get_block+0x0/0xf2 611b3b08: [<6005483d>] get_page_from_freelist+0x8d/0xc1 611b3b88: [<600d6937>] ext3_readpages+0x18/0x1a 611b3b98: [<60056f00>] read_pages+0x37/0x9b 611b3bd8: [<60057064>] __do_page_cache_readahead+0x100/0x157 611b3c48: [<60057196>] do_page_cache_readahead+0x52/0x5f 611b3c78: [<60050ab4>] filemap_fault+0x145/0x278 611b3ca8: [<60022b61>] run_syscall_stub+0xd1/0xdd 611b3ce8: [<6005eae3>] __do_fault+0x7e/0x3ca 611b3d68: [<6005ee60>] do_linear_fault+0x31/0x33 611b3d88: [<6005f149>] handle_mm_fault+0x14e/0x246 611b3da8: [<60120a7b>] __up_read+0x73/0x7b 611b3de8: [<60013177>] handle_page_fault+0x11f/0x23b 611b3e48: [<60013419>] segv+0xac/0x297 611b3f28: [<60013367>] segv_handler+0x68/0x6e 611b3f48: [<600232ad>] get_skas_faultinfo+0x9c/0xa1 611b3f68: [<60023853>] userspace+0x13a/0x19d 611b3fc8: [<60010d58>] fork_handler+0x86/0x8d Mem-info: Normal per-cpu: CPU0: Hot: hi: 42, btch: 7 usd: 0 Cold: hi: 14, btch: 3 usd: 0 Active:11 inactive:9 dirty:0 writeback:1 unstable:0 free:19533 slab:10587 mapped:0 pagetables:260 bounce:0 Normal free:78132kB min:4096kB low:5120kB high:6144kB active:44kB inactive:36kB present:129280kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 Normal: 7503*4kB 5977*8kB 19*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 78132kB Swap cache: add 1192822, delete 1192790, find 491441/626861, race 0+1 Free swap = 455300kB Total swap = 524280kB Free swap: 455300kB 32768 pages of RAM 0 pages of HIGHMEM 1948 reserved pages 11 pages shared 32 pages swap cached Out of memory: kill process 2647 (portmap) score 2233 or a child Killed process 2647 (portmap) - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Upgrading datastructures between different filesystem versions
On Fri, Sep 28, 2007 at 03:47:24PM -0400, Theodore Tso wrote: > Ext3 does something similar, zapping space at the beginning AND the > end of the partition (because the MD superblocks are at the end). > It's just a misfeature of reiserfs's mkfs that it doesn't do this. mkfs.xfs of course also whipes at the end. I just wanted to show how easy this is to fix. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 29 Sep 2007 10:47:12 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Sat, 2007-09-29 at 01:13 -0700, Andrew Morton wrote: > > On Fri, 28 Sep 2007 20:25:50 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > > > > > On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote: > > > > > > > > start 2 processes that each mmap a separate 64M file, and which does > > > > > sequential writes on them. start a 3th process that does the same with > > > > > 64M anonymous. > > > > > > > > > > wait for a while, and you'll see order=1 failures. > > > > > > > > Really? That means we can no longer even allocate stacks for forking. > > > > > > > > Its surprising that neither lumpy reclaim nor the mobility patches can > > > > deal with it? Lumpy reclaim should be able to free neighboring pages to > > > > avoid the order 1 failure unless there are lots of pinned pages. > > > > > > > > I guess then that lots of pages are pinned through I/O? > > > > > > memory got massively fragemented, as anti-frag gets easily defeated. > > > setting min_free_kbytes to 12M does seem to solve it - it forces 2 max > > > order blocks to stay available, so we don't mix types. however 12M on > > > 128M is rather a lot. > > > > > > its still on my todo list to look at it further.. > > > > > > > That would be really really bad (as in: patch-dropping time) if those > > order-1 allocations are not atomic. > > > > What's the callsite? > > Ah, right, that was the detail... all this lumpy reclaim is useless for > atomic allocations. And with SLUB using higher order pages, atomic !0 > order allocations will be very very common. Oh OK. I thought we'd already fixed slub so that it didn't do that. Maybe that fix is in -mm but I don't think so. Trying to do atomic order-1 allocations on behalf of arbitray slab caches just won't fly - this is a significant degradation in kernel reliability, as you've very easily demonstrated. > One I can remember was: > > add_to_page_cache() > radix_tree_insert() > radix_tree_node_alloc() > kmem_cache_alloc() > > which is an atomic callsite. > > Which leaves us in a situation where we can load pages, because there is > free memory, but can't manage to allocate memory to track them.. Right. Leading to application failure which for many is equivalent to a complete system outage. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 29 Sep 2007 10:53:41 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Sat, 2007-09-29 at 10:47 +0200, Peter Zijlstra wrote: > > > Ah, right, that was the detail... all this lumpy reclaim is useless for > > atomic allocations. And with SLUB using higher order pages, atomic !0 > > order allocations will be very very common. > > > > One I can remember was: > > > > add_to_page_cache() > > radix_tree_insert() > > radix_tree_node_alloc() > > kmem_cache_alloc() > > > > which is an atomic callsite. > > > > Which leaves us in a situation where we can load pages, because there is > > free memory, but can't manage to allocate memory to track them.. > > Ah, I found a boot log of one of these sessions, its also full of > order-2 OOMs.. :-/ oom-killings, or page allocation failures? The latter, one hopes. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 2007-09-29 at 10:47 +0200, Peter Zijlstra wrote: > Ah, right, that was the detail... all this lumpy reclaim is useless for > atomic allocations. And with SLUB using higher order pages, atomic !0 > order allocations will be very very common. > > One I can remember was: > > add_to_page_cache() > radix_tree_insert() > radix_tree_node_alloc() > kmem_cache_alloc() > > which is an atomic callsite. > > Which leaves us in a situation where we can load pages, because there is > free memory, but can't manage to allocate memory to track them.. Ah, I found a boot log of one of these sessions, its also full of order-2 OOMs.. :-/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Sat, 2007-09-29 at 01:13 -0700, Andrew Morton wrote: > On Fri, 28 Sep 2007 20:25:50 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > > > > On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote: > > > > > > start 2 processes that each mmap a separate 64M file, and which does > > > > sequential writes on them. start a 3th process that does the same with > > > > 64M anonymous. > > > > > > > > wait for a while, and you'll see order=1 failures. > > > > > > Really? That means we can no longer even allocate stacks for forking. > > > > > > Its surprising that neither lumpy reclaim nor the mobility patches can > > > deal with it? Lumpy reclaim should be able to free neighboring pages to > > > avoid the order 1 failure unless there are lots of pinned pages. > > > > > > I guess then that lots of pages are pinned through I/O? > > > > memory got massively fragemented, as anti-frag gets easily defeated. > > setting min_free_kbytes to 12M does seem to solve it - it forces 2 max > > order blocks to stay available, so we don't mix types. however 12M on > > 128M is rather a lot. > > > > its still on my todo list to look at it further.. > > > > That would be really really bad (as in: patch-dropping time) if those > order-1 allocations are not atomic. > > What's the callsite? Ah, right, that was the detail... all this lumpy reclaim is useless for atomic allocations. And with SLUB using higher order pages, atomic !0 order allocations will be very very common. One I can remember was: add_to_page_cache() radix_tree_insert() radix_tree_node_alloc() kmem_cache_alloc() which is an atomic callsite. Which leaves us in a situation where we can load pages, because there is free memory, but can't manage to allocate memory to track them.. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote: > Really? That means we can no longer even allocate stacks for forking. I think I'm running with 4k stacks... - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK
On Fri, 28 Sep 2007 20:25:50 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote: > > > > start 2 processes that each mmap a separate 64M file, and which does > > > sequential writes on them. start a 3th process that does the same with > > > 64M anonymous. > > > > > > wait for a while, and you'll see order=1 failures. > > > > Really? That means we can no longer even allocate stacks for forking. > > > > Its surprising that neither lumpy reclaim nor the mobility patches can > > deal with it? Lumpy reclaim should be able to free neighboring pages to > > avoid the order 1 failure unless there are lots of pinned pages. > > > > I guess then that lots of pages are pinned through I/O? > > memory got massively fragemented, as anti-frag gets easily defeated. > setting min_free_kbytes to 12M does seem to solve it - it forces 2 max > order blocks to stay available, so we don't mix types. however 12M on > 128M is rather a lot. > > its still on my todo list to look at it further.. > That would be really really bad (as in: patch-dropping time) if those order-1 allocations are not atomic. What's the callsite? - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html