Re: kernel OOPS in MM(?)
Hello, On 2016-03-10 12:31, Evgenii Lepikhin wrote: > We need help to understand the source of the problem and may be to create a > bugreport. Here is crash report: > > Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel > paging request at 40008021 > Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] > __kmalloc+0x69/0x100 > Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 > Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops: [#1] SMP > Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in: > tcm_loop iscsi_target_mod target_core_pscsi target_core_file > target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp > libis > csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod] > Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm > Tainted: G O 3.12.51-jl-2015-12-25 #1 > Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation > S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 > 02/26/2013 > Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: > 880eeb008000 task.ti: 880eeb008000 > Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[] > [] __kmalloc+0x69/0x100 > Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38 > EFLAGS: 00010282 > Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX: RBX: > RCX: a8a73dc2 > Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: > RDI: 00013500 > Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: > 88103fc13500 R09: 811a0267 > Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: > R12: 88081f803a00 > Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: > 0250 R15: 880250e833b0 > Mar 10 04:03:51 l28 kernel: [2075560.438851] FS: 7fe2316dd700() > GS:88103fc0() knlGS: > Mar 10 04:03:51 l28 kernel: [2075560.439045] CS: 0010 DS: ES: CR0: > 80050033 > Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: > 000a20736000 CR4: 000407e0 > Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack: > Mar 10 04:03:51 l28 kernel: [2075560.439439] > 0250 0060 > Mar 10 04:03:51 l28 kernel: [2075560.439663] 880eeb009b88 > 811a0267 881015fb7fe0 0060 > Mar 10 04:03:51 l28 kernel: [2075560.439898] 880250e83490 > 880eeb009ba8 811a02f8 > Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace: > Mar 10 04:03:51 l28 kernel: [2075560.440257] [] > kmem_alloc+0x67/0xe0 > Mar 10 04:03:51 l28 kernel: [2075560.440365] [] > kmem_zalloc+0x18/0x40 > Mar 10 04:03:51 l28 kernel: [2075560.440473] [] > xfs_log_commit_cil+0x373/0x4c0 > Mar 10 04:03:51 l28 kernel: [2075560.440585] [] ? > xfs_bmap_search_multi_extents+0xe0/0x110 > Mar 10 04:03:51 l28 kernel: [2075560.440783] [] > xfs_trans_commit+0x6c/0x250 > Mar 10 04:03:51 l28 kernel: [2075560.440899] [] > xfs_bmap_finish+0xb7/0x1a0 Another issue on the same server, same instruction pointer: Mar 16 04:53:54 l28 kernel: [521052.387878] BUG: unable to handle kernel paging request at 40008021 Mar 16 04:53:54 l28 kernel: [521052.388022] IP: [] __kmalloc+0x69/0x100 Mar 16 04:53:54 l28 kernel: [521052.388171] PGD 0 Mar 16 04:53:54 l28 kernel: [521052.388289] Oops: [#1] SMP Mar 16 04:53:54 l28 kernel: [521052.388410] Modules linked in: tcm_loop iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis csi_tcp libiscsi scsi_transport_iscsi fuse Mar 16 04:53:54 l28 kernel: [521052.388913] CPU: 6 PID: 5947 Comm: iscsi_trx Tainted: G O 3.12.51-jl-2015-12-25 #1 Mar 16 04:53:54 l28 kernel: [521052.389125] Hardware name: Intel Corporation S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Mar 16 04:53:54 l28 kernel: [521052.389351] task: 88081a3a6720 ti: 8808162de000 task.ti: 8808162de000 Mar 16 04:53:54 l28 kernel: [521052.389566] RIP: 0010:[] [] __kmalloc+0x69/0x100 Mar 16 04:53:54 l28 kernel: [521052.389782] RSP: 0018:8808162dfd18 EFLAGS: 00010286 Mar 16 04:53:54 l28 kernel: [521052.389899] RAX: RBX: 880819a51800 RCX: 03b305d3 Mar 16 04:53:54 l28 kernel: [521052.390112] RDX: 03b305d2 RSI: RDI: 00013500 Mar 16 04:53:54 l28 kernel: [521052.390309] RBP: 8808162dfd38 R08: 88103fd13500 R09: a00e7072 Mar 16 04:53:54 l28 kernel: [521052.390503] R10: 0010 R11: 0030 R12: 88081f803a00 Mar 16 04:53:54 l28 kernel: [521052.390694] R13: 40008021 R14: 80d0 R15: 8808162dfdd0 Mar 16 0
kernel OOPS in MM(?)
Hi, We need help to understand the source of the problem and may be to create a bugreport. Here is crash report: Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel paging request at 40008021 Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] __kmalloc+0x69/0x100 Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops: [#1] SMP Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in: tcm_loop iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod] Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm Tainted: G O 3.12.51-jl-2015-12-25 #1 Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: 880eeb008000 task.ti: 880eeb008000 Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[] [] __kmalloc+0x69/0x100 Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38 EFLAGS: 00010282 Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX: RBX: RCX: a8a73dc2 Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: RDI: 00013500 Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: 88103fc13500 R09: 811a0267 Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: R12: 88081f803a00 Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: 0250 R15: 880250e833b0 Mar 10 04:03:51 l28 kernel: [2075560.438851] FS: 7fe2316dd700() GS:88103fc0() knlGS: Mar 10 04:03:51 l28 kernel: [2075560.439045] CS: 0010 DS: ES: CR0: 80050033 Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: 000a20736000 CR4: 000407e0 Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack: Mar 10 04:03:51 l28 kernel: [2075560.439439] 0250 0060 Mar 10 04:03:51 l28 kernel: [2075560.439663] 880eeb009b88 811a0267 881015fb7fe0 0060 Mar 10 04:03:51 l28 kernel: [2075560.439898] 880250e83490 880eeb009ba8 811a02f8 Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace: Mar 10 04:03:51 l28 kernel: [2075560.440257] [] kmem_alloc+0x67/0xe0 Mar 10 04:03:51 l28 kernel: [2075560.440365] [] kmem_zalloc+0x18/0x40 Mar 10 04:03:51 l28 kernel: [2075560.440473] [] xfs_log_commit_cil+0x373/0x4c0 Mar 10 04:03:51 l28 kernel: [2075560.440585] [] ? xfs_bmap_search_multi_extents+0xe0/0x110 Mar 10 04:03:51 l28 kernel: [2075560.440783] [] xfs_trans_commit+0x6c/0x250 Mar 10 04:03:51 l28 kernel: [2075560.440899] [] xfs_bmap_finish+0xb7/0x1a0 Mar 10 04:03:51 l28 kernel: [2075560.441017] [] xfs_itruncate_extents+0xe3/0x200 Mar 10 04:03:51 l28 kernel: [2075560.441131] [] xfs_inactive+0x27c/0x3a0 Mar 10 04:03:51 l28 kernel: [2075560.441275] [] ? wake_atomic_t_function+0x40/0x40 Mar 10 04:03:51 l28 kernel: [2075560.441386] [] xfs_fs_evict_inode+0x73/0x80 Mar 10 04:03:51 l28 kernel: [2075560.441498] [] evict+0xaa/0x1b0 Mar 10 04:03:51 l28 kernel: [2075560.441604] [] iput+0x103/0x1a0 Mar 10 04:03:51 l28 kernel: [2075560.441713] [] do_unlinkat+0x1cf/0x240 Mar 10 04:03:51 l28 kernel: [2075560.441823] [] ? SyS_newfstatat+0x25/0x30 Mar 10 04:03:51 l28 kernel: [2075560.441932] [] SyS_unlinkat+0x1d/0x40 Mar 10 04:03:51 l28 kernel: [2075560.442044] [] system_call_fastpath+0x16/0x1b Mar 10 04:03:51 l28 kernel: [2075560.442155] Code: 65 4c 03 04 25 48 bc 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 6b 48 85 c0 74 66 49 63 44 24 20 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 bd 49 Mar 10 04:03:51 l28 kernel: [2075560.442904] RIP [] __kmalloc+0x69/0x100 Mar 10 04:03:51 l28 kernel: [2075560.443020] RSP Mar 10 04:03:51 l28 kernel: [2075560.443122] CR2: 40008021 Mar 10 04:03:51 l28 kernel: [2075560.443809] ---[ end trace 92cd2d4bad1896f4 ]--- Kernel 3.12.51. Gdb listing: (gdb) list *(__kmalloc+0x69) 0x810ee519 is in __kmalloc (mm/slub.c:260). [...] 258 static inline void *get_freepointer(struct kmem_cache *s, void *object) 259 { 260 return *(void **)(object + s->offset); 261 } What whould be the next step? Thank you. -- UNIX/Ocaml engineer at 1Gb.ru. Telegram: johnlepikhin
RE: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9
On Thu, 24 Feb 2005, Colin Harrison wrote: > Fresher trace of my oops captured over a serial tty link (kernel now > 2.6.11rc4-bk11) > > kernel BUG at mm/rmap.c:482! > EIP is at page_remove_rmap+0x38/0x50 > Process cc1 (pid: 10543, threadinfo=d10e task=ced11a80) Welcome to the exclusive club of those who've seen this - check the archives for other sightings. > > -Original Message- > > From: Colin Harrison [mailto:[EMAIL PROTECTED] > > Sent: 21 February 2005 18:45 > > To: 'linux-kernel@vger.kernel.org' > > Subject: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9 Sorry, that message never reached the list - marc hasn't got it. [ I've snipped out a lot of info you were right to supply, thank you, though it'll probably turn out to be irrelevant. ] > > More info can be supplied if required and patches compiled in > > to trace etc. Thank you: please apply the patch below, and mail me back any interesting messages you see - should be captured by dmesg, or better /var/log/messages which will show the times too, to help group them. Not just the "Bad rmap:" errors this patch adds, there might be relevant "Bad page state" errors or "swap_free:" errors. The system should stay up. Perhaps you'll be swamped with messages (don't spam the list with them if so, just me). > > Note that I have run a memory checker (memtest-86 v3.2 boot > > CD) for ~ 3hours with no errors. Right thing to try first, though may not have been long enough. Since you've already tried that, let's try the patch below next. > > I can usually repeat the crash while compiling a > > kernel..which after a reboot recovers and allows 'make > > bzImage' 'make modules' to finish! If you can reproduce this fairly easily, that would be _wonderful_! It's dragged on for months, but nobody sees it often enough to get an understanding of it. It's possible that most sightings are in fact due to bad memory, and the rmap.c BUG a good (but tiresome) checker itself - I do hope your info will give us more of a clue. Thanks, Hugh p.s. Ammar, I've cc'ed you for interest as the most recent member of the club; but this patch below is unlikely to apply cleanly to your Fedora kernel - if you're used to moving patches from one release to another, you might want to try it, but probably not. --- 2.6.11-rc5/include/linux/rmap.h 2004-12-24 21:36:18.0 + +++ linux/include/linux/rmap.h 2005-02-24 20:52:17.0 + @@ -72,7 +72,7 @@ void __anon_vma_link(struct vm_area_stru */ void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long); void page_add_file_rmap(struct page *); -void page_remove_rmap(struct page *); +void page_remove_rmap(struct page *, struct vm_area_struct *, unsigned long); /** * page_dup_rmap - duplicate pte mapping to a page --- 2.6.11-rc5/mm/fremap.c 2005-02-24 20:11:11.0 + +++ linux/mm/fremap.c 2005-02-24 20:52:17.0 + @@ -37,7 +37,7 @@ static inline void zap_pte(struct mm_str if (!PageReserved(page)) { if (pte_dirty(pte)) set_page_dirty(page); - page_remove_rmap(page); + page_remove_rmap(page, vma, addr); page_cache_release(page); mm->rss--; } --- 2.6.11-rc5/mm/memory.c 2005-02-24 20:11:11.0 + +++ linux/mm/memory.c 2005-02-24 20:52:17.0 + @@ -452,6 +452,7 @@ next_pgd: } static void zap_pte_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, unsigned long size, struct zap_details *details) { @@ -517,7 +518,7 @@ static void zap_pte_range(struct mmu_gat else if (pte_young(pte)) mark_page_accessed(page); tlb->freed++; - page_remove_rmap(page); + page_remove_rmap(page, vma, address+offset); tlb_remove_page(tlb, page); continue; } @@ -535,6 +536,7 @@ static void zap_pte_range(struct mmu_gat } static void zap_pmd_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, pud_t *pud, unsigned long address, unsigned long size, struct zap_details *details) { @@ -553,13 +555,14 @@ static void zap_pmd_range(struct mmu_gat if (end > ((address + PUD_SIZE) & PUD_MASK)) end = ((address + PUD_SIZE) & PUD_MASK); do { - zap_pte_range(tlb, pmd, address, end - address, details); +
RE: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9
Hi, Fresher trace of my oops captured over a serial tty link (kernel now 2.6.11rc4-bk11) Kernel 2.6.11colin on an i686 / ttyS0 verbier.straightrunning.com login: [ cut here ] kernel BUG at mm/rmap.c:482! invalid operand: [#1] Modules linked in: parport_pc lp parport ipt_LOG ipt_REJECT ipt_state ipt_limite CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010296 (2.6.11colin) EIP is at page_remove_rmap+0x38/0x50 eax: f000 ebx: 00064000 ecx: c039878c edx: c123ca00 esi: d2389f30 edi: 00098000 ebp: d10e0e4c esp: d10e0e4c ds: 007b es: 007b ss: 0068 Process cc1 (pid: 10543, threadinfo=d10e task=ced11a80) Stack: d10e0e80 c013661b c12335c0 cef90904 0001 11e50067 c123ca00 b7b68000 c039878c b7f68000 d1f10b7c b7c0 d10e0ea8 c01367a1 00098000 c039878c b8241000 c039878c b7b68000 d1f10b7c b7c0 d10e0ed0 Call Trace: [] show_stack+0xa6/0xb0 [] show_registers+0x149/0x1c0 [] die+0xbb/0x150 [] do_trap+0x7e/0xc0 [] do_invalid_op+0xa5/0xb0 [] error_code+0x2b/0x30 [] zap_pte_range+0x13b/0x270 [] zap_pmd_range+0x51/0x70 [] zap_pud_range+0x3d/0x70 [] unmap_page_range+0x67/0x80 [] unmap_vmas+0xf4/0x1e0 [] exit_mmap+0x68/0x130 [] mmput+0x21/0x70 [] exit_mm+0xbe/0xe0 [] do_exit+0x9d/0x2c0 [] do_group_exit+0x32/0x70 [] sys_exit_group+0xf/0x20 [] sysenter_past_esp+0x52/0x75 Code: 42 08 ff 0f 98 c0 84 c0 74 15 8b 42 08 40 78 1b ba ff ff ff ff b8 10 00 0 Entering kdb (current=0xced11a80, pid 10543) Oops: invalid operand due to oops @ 0xc013cc28 eax = 0xf000 ebx = 0x00064000 ecx = 0xc039878c edx = 0xc123ca00 esi = 0xd2389f30 edi = 0x00098000 esp = 0xd10e0e4c eip = 0xc013cc28 ebp = 0xd10e0e4c xss = 0xc0210068 xcs = 0x0060 eflags = 0x00010296 xds = 0xc122007b xes = 0x007b origeax = 0x ®s = 0xd10e0e18 kdb> Thanks Colin Harrison > -Original Message- > From: Colin Harrison [mailto:[EMAIL PROTECTED] > Sent: 21 February 2005 18:45 > To: 'linux-kernel@vger.kernel.org' > Subject: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9 > > Hi > > I've been getting a hang on my firewall machine for some > weeks when pushed. > > Kernel freezes when heavily loaded either compiling or under > lots of web traffic. > Machine is a Pentium III 450MHz i440BX Intel Mobo with 384MHz > memory running a firewall (iptables-1.3.0-200050220) via two > natsemi network cards. > (old machine but should be good as firewall only...been solid > for years!) Kernel 2.6.11rc4-bk9 + supermount patch. > > I have another machine which I am going to connect via serial > cable to help debug. > (this has also been known to freeze on 2.6.11...a common > denominator is that I use Netgear natsemi net cards in > both...probably not significant?) > > I've today built in KDB and get the following trace:- > > In this case while compiling a kernel (make bzImage...running cc1) > > Typed by hand:- > > kernel BUG at mm/rmap.c:482! > invalid operand: [#1] > Modules linked in: parport_pc lp parport ipt_LOG ipt_REJECT > ipt_state ipt_limit iptable_mangle iptable_filter ip_nat_ftp > iptable_nat ip_conntrack_ftp floppy natsemi supermount > intel_agp agpgart cs4243 ad1848 uart401 sound soundcore > CPU:0 > EIP:0060:[] Not tainted VLI > EFLAGS: 00010296 (2.6.11colin) > EIP is at page_remove_rmap+0x38/0x50 > eax: f000 ebx: 0002f000 ecx: c038d78c edx: c123sd20 > esi: d0e7e44c edi: 0009 ebp: d2a6ae4c esp: d2a6ae4c > ds: 007b es: 007b ss: 0068 > Process cc1 (pid: 10835, threadinfo=d2a6a000 task=cf57aaa0) > Stack: d2a6ae80 > etc.. > > > > Entered kdb (current=0xcf57aaa0, pid 10835) Oops: invalid > operand due to oops @ 0xc013cc28 etc... > > More info can be supplied if required and patches compiled in > to trace etc. > Note that I have run a memory checker (memtest-86 v3.2 boot > CD) for ~ 3hours with no errors. > I can usually repeat the crash while compiling a > kernel..which after a reboot recovers and allows 'make > bzImage' 'make modules' to finish! > > Thanks > Colin Harrison > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8
Title: RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8 > -Original Message- > From: Andrew Morton [mailto:[EMAIL PROTECTED]] > Sent: Friday, September 15, 2000 11:47 PM > To: Earle, Jonathan [KAN:1A31:EXCH] > Cc: Linux MPLS List (E-mail); Linux Kernel List (E-mail) > Subject: Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] > with test4-8 > > > > Jonathan Earle wrote: > > > > Hi, > > > > I've been having kernel oopses with the 2.4.0-test series and am > > including ksymoops processed output from both test4 and test5 > > kernels. The same oops happens in later kernels too (Tested with > > test6, test7 and test8). > > > > Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) > from within > a softirq. Hunt that down and turn it into GFP_ATOMIC. Okay... Did that (turned all the GFP_KERNEL refereces in net/mpls to GFP_ATOMIC, and the problem seems to have gone away, I'll post a more confident summary when I'm more sure that things are working properly. Now, what did I do (aside from fixing the problem) by changing that reference? Many thanks for the hint!! Jon
Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8
> Jonathan Earle wrote: > > Hi, > > I've been having kernel oopses with the 2.4.0-test series and am > including ksymoops processed output from both test4 and test5 > kernels. The same oops happens in later kernels too (Tested with > test6, test7 and test8). > Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) from within a softirq. Hunt that down and turn it into GFP_ATOMIC. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8
Title: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8 Hi, I've been having kernel oopses with the 2.4.0-test series and am including ksymoops processed output from both test4 and test5 kernels. The same oops happens in later kernels too (Tested with test6, test7 and test8). The scenario is this: I have an incoming UDP stream at 1mbit. The router marks packets in this stream, according to port ranges, with 3 (or any # of) marks (via iptables v1.1.1). iproute2 builds new routing tables based on these marks, and mplsadm, with the tc patch, is called to build LSPs using these routing tables. Finally, the 3 egress LSPs are rate limited using tc (employing cbq classes) to a value less than the ingress rate (ie: I limited each LSP to 200kbit, for an aggregate egress output rate of 600kbit). When I start the traffic flowing from our generator, the box panics and freezes quite solidly. Policing via filters also crashes the box. If I move the egress rate limiting function to another box, it works okay. I've also noted that the crash only occurs if I throttle the traffic flow to an egress rate which is less than the ingress rate (ie: ingress flow at 1mbit and egress flow at 1mbit works fine. If the egress rate is reduced, boom!) I copied down the oopses and ran 'ksymoops < oops.txt > oops_proc.txt' and pasted them here. The first is from kernel 2.4.0-test4 and the second from 2.4.0-test5. NEW: Here's the funny part. In mm/slab.c, the function kmem_cache_grow() contains a check as follows: /* * The test for missing atomic flag is performed here, rather than * the more obvious place, simply to reduce the critical path length * in kmem_cache_alloc(). If a caller is seriously mis-behaving they * will eventually be caught here (where it matters). */ /* Commented out Sep 15 since it was crashing my router. */ /* if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC) BUG(); */ This is the check that fails and causes the oops. Not understanding what is actually being checked, and not knowing the repercussions of tampering with it, I commented out the check, recompiled and reran the test. I understand that this is not really a fix (it's more akin to just turning my head and pretending that the problem doesn't exist, but... it seems to work.) The result: Great joy and much celebration! I'm throwing 7.2mbps at the box, limiting the rate to 900kbit aggregate throughput and it's working! The numbers I'm getting also seem to jive with anticipated results. Cheers! Jon ksymoops 0.7c on i686 2.4.0-test4. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.0-test4/ (default) -m /usr/src/linux/System.map (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. invalid operand: CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 001b ebx: c7ffd0c0 ecx: edx: 0082 esi: 0246 edi: c7ffd0c0 ebp: 0007 esp: c024fe70 ds: 0018 es: 0018 ss: 0018 Process swapper (pid:0, stackpage=c024f000) Stack: c01fb794 c01fb834 0412 c7ffd0c0 0247 0007 c024fed4 c7d1602e c0127aaf c7ffd0c0 0007 c7d170e0 c7d1602e c01eb196 0008 0007 c7d170e0 c7d1602e c7f8be00 c01b6aaf c7d170e0 Call trace: [][][][][][][] [][][][][][][][] [][][][][][][][] Code: 0f 0b 83 c4 0c c7 44 24 10 01 00 00 00 89 ee 83 e6 07 b8 03 >>EIP; c01277fd <= Trace; c01fb794 Trace; c01fb834 Trace; c0127aaf Trace; c01eb196 Trace; c01b6aaf Trace; c01b6c6f Trace; c01b6a84 Trace; c019b1c4 Trace; c01b6936 Trace; c01b6a84 Trace; c019efe3 Trace; c011b17f Trace; c010b8ee Trace; c01087e0 Trace; c01087e0 Trace; c010a518 Trace; c01087e0 Trace; c01087e0 Trace; c0100018 Trace; c0108803 Trace; c0108864 Trace; c0105000 Trace; c0100192 Code; c01277fd <_EIP>: Code; c01277fd <= 0: 0f 0b ud2a <= Code; c01277ff 2: 83 c4 0c add $0xc,%esp Code; c0127802 5: c7 44 24 10 01 00 00 movl $0x1,0x10(%esp,1) Code; c0127809 c: 00 Code; c012780a d: 89 ee mov %ebp,%esi Code; c012780c f: 83 e6 07 and $0x7,%esi Code; c012780f 12: b8 03 00 00 00