Re: kernel OOPS in MM(?)

2016-03-15 Thread Evgenii Lepikhin
Hello,

On 2016-03-10 12:31, Evgenii Lepikhin wrote:

> We need help to understand the source of the problem and may be to create a 
> bugreport. Here is crash report:
>
> Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel 
> paging request at 40008021
> Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] 
> __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 
> Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops:  [#1] SMP 
> Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in:
> tcm_loop iscsi_target_mod target_core_pscsi target_core_file
> target_core_iblock target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp
> libis
> csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod]
> Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm 
> Tainted: G   O 3.12.51-jl-2015-12-25 #1
> Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation 
> S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 
> 02/26/2013
> Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: 
> 880eeb008000 task.ti: 880eeb008000
> Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[]  
> [] __kmalloc+0x69/0x100
> Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38  
> EFLAGS: 00010282
> Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX:  RBX: 
>  RCX: a8a73dc2
> Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: 
>  RDI: 00013500
> Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: 
> 88103fc13500 R09: 811a0267
> Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: 
>  R12: 88081f803a00
> Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: 
> 0250 R15: 880250e833b0
> Mar 10 04:03:51 l28 kernel: [2075560.438851] FS:  7fe2316dd700() 
> GS:88103fc0() knlGS:
> Mar 10 04:03:51 l28 kernel: [2075560.439045] CS:  0010 DS:  ES:  CR0: 
> 80050033
> Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: 
> 000a20736000 CR4: 000407e0
> Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack:
> Mar 10 04:03:51 l28 kernel: [2075560.439439]   
> 0250 0060 
> Mar 10 04:03:51 l28 kernel: [2075560.439663]  880eeb009b88 
> 811a0267 881015fb7fe0 0060
> Mar 10 04:03:51 l28 kernel: [2075560.439898]  880250e83490 
>  880eeb009ba8 811a02f8
> Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace:
> Mar 10 04:03:51 l28 kernel: [2075560.440257]  [] 
> kmem_alloc+0x67/0xe0
> Mar 10 04:03:51 l28 kernel: [2075560.440365]  [] 
> kmem_zalloc+0x18/0x40
> Mar 10 04:03:51 l28 kernel: [2075560.440473]  [] 
> xfs_log_commit_cil+0x373/0x4c0
> Mar 10 04:03:51 l28 kernel: [2075560.440585]  [] ? 
> xfs_bmap_search_multi_extents+0xe0/0x110
> Mar 10 04:03:51 l28 kernel: [2075560.440783]  [] 
> xfs_trans_commit+0x6c/0x250
> Mar 10 04:03:51 l28 kernel: [2075560.440899]  [] 
> xfs_bmap_finish+0xb7/0x1a0

Another issue on the same server, same instruction pointer:

Mar 16 04:53:54 l28 kernel: [521052.387878] BUG: unable to handle kernel paging 
request at 40008021
Mar 16 04:53:54 l28 kernel: [521052.388022] IP: [] 
__kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.388171] PGD 0 
Mar 16 04:53:54 l28 kernel: [521052.388289] Oops:  [#1] SMP 
Mar 16 04:53:54 l28 kernel: [521052.388410] Modules linked in: tcm_loop 
iscsi_target_mod target_core_pscsi target_core_file target_core_iblock 
target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis
csi_tcp libiscsi scsi_transport_iscsi fuse
Mar 16 04:53:54 l28 kernel: [521052.388913] CPU: 6 PID: 5947 Comm: iscsi_trx 
Tainted: G   O 3.12.51-jl-2015-12-25 #1
Mar 16 04:53:54 l28 kernel: [521052.389125] Hardware name: Intel Corporation 
S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Mar 16 04:53:54 l28 kernel: [521052.389351] task: 88081a3a6720 ti: 
8808162de000 task.ti: 8808162de000
Mar 16 04:53:54 l28 kernel: [521052.389566] RIP: 0010:[]  
[] __kmalloc+0x69/0x100
Mar 16 04:53:54 l28 kernel: [521052.389782] RSP: 0018:8808162dfd18  EFLAGS: 
00010286
Mar 16 04:53:54 l28 kernel: [521052.389899] RAX:  RBX: 
880819a51800 RCX: 03b305d3
Mar 16 04:53:54 l28 kernel: [521052.390112] RDX: 03b305d2 RSI: 
 RDI: 00013500
Mar 16 04:53:54 l28 kernel: [521052.390309] RBP: 8808162dfd38 R08: 
88103fd13500 R09: a00e7072
Mar 16 04:53:54 l28 kernel: [521052.390503] R10: 0010 R11: 
0030 R12: 88081f803a00
Mar 16 04:53:54 l28 kernel: [521052.390694] R13: 40008021 R14: 
80d0 R15: 8808162dfdd0
Mar 16 0

kernel OOPS in MM(?)

2016-03-10 Thread Evgenii Lepikhin
Hi,

We need help to understand the source of the problem and may be to create a 
bugreport. Here is crash report:

Mar 10 04:03:51 l28 kernel: [2075560.434445] BUG: unable to handle kernel 
paging request at 40008021
Mar 10 04:03:51 l28 kernel: [2075560.434669] IP: [] 
__kmalloc+0x69/0x100
Mar 10 04:03:51 l28 kernel: [2075560.434800] PGD b7e462067 PUD 0 
Mar 10 04:03:51 l28 kernel: [2075560.434913] Oops:  [#1] SMP 
Mar 10 04:03:51 l28 kernel: [2075560.435044] Modules linked in: tcm_loop 
iscsi_target_mod target_core_pscsi target_core_file target_core_iblock 
target_core_mod ipt_NETFLOW(O) configfs iscsi_tcp libis
csi_tcp libiscsi scsi_transport_iscsi fuse [last unloaded: ipfw_mod]
Mar 10 04:03:51 l28 kernel: [2075560.435539] CPU: 4 PID: 27141 Comm: rm 
Tainted: G   O 3.12.51-jl-2015-12-25 #1
Mar 10 04:03:51 l28 kernel: [2075560.435734] Hardware name: Intel Corporation 
S2600IP ../S2600IP, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Mar 10 04:03:51 l28 kernel: [2075560.435939] task: 880e622ccba0 ti: 
880eeb008000 task.ti: 880eeb008000
Mar 10 04:03:51 l28 kernel: [2075560.436131] RIP: 0010:[]  
[] __kmalloc+0x69/0x100
Mar 10 04:03:51 l28 kernel: [2075560.436333] RSP: 0018:880eeb009b38  
EFLAGS: 00010282
Mar 10 04:03:51 l28 kernel: [2075560.436439] RAX:  RBX: 
 RCX: a8a73dc2
Mar 10 04:03:51 l28 kernel: [2075560.436632] RDX: a8a73dc1 RSI: 
 RDI: 00013500
Mar 10 04:03:51 l28 kernel: [2075560.438248] RBP: 880eeb009b58 R08: 
88103fc13500 R09: 811a0267
Mar 10 04:03:51 l28 kernel: [2075560.438446] R10: 880eeb009d84 R11: 
 R12: 88081f803a00
Mar 10 04:03:51 l28 kernel: [2075560.438656] R13: 40008021 R14: 
0250 R15: 880250e833b0
Mar 10 04:03:51 l28 kernel: [2075560.438851] FS:  7fe2316dd700() 
GS:88103fc0() knlGS:
Mar 10 04:03:51 l28 kernel: [2075560.439045] CS:  0010 DS:  ES:  CR0: 
80050033
Mar 10 04:03:51 l28 kernel: [2075560.439152] CR2: 40008021 CR3: 
000a20736000 CR4: 000407e0
Mar 10 04:03:51 l28 kernel: [2075560.439343] Stack:
Mar 10 04:03:51 l28 kernel: [2075560.439439]   0250 
0060 
Mar 10 04:03:51 l28 kernel: [2075560.439663]  880eeb009b88 811a0267 
881015fb7fe0 0060
Mar 10 04:03:51 l28 kernel: [2075560.439898]  880250e83490  
880eeb009ba8 811a02f8
Mar 10 04:03:51 l28 kernel: [2075560.440153] Call Trace:
Mar 10 04:03:51 l28 kernel: [2075560.440257]  [] 
kmem_alloc+0x67/0xe0
Mar 10 04:03:51 l28 kernel: [2075560.440365]  [] 
kmem_zalloc+0x18/0x40
Mar 10 04:03:51 l28 kernel: [2075560.440473]  [] 
xfs_log_commit_cil+0x373/0x4c0
Mar 10 04:03:51 l28 kernel: [2075560.440585]  [] ? 
xfs_bmap_search_multi_extents+0xe0/0x110
Mar 10 04:03:51 l28 kernel: [2075560.440783]  [] 
xfs_trans_commit+0x6c/0x250
Mar 10 04:03:51 l28 kernel: [2075560.440899]  [] 
xfs_bmap_finish+0xb7/0x1a0
Mar 10 04:03:51 l28 kernel: [2075560.441017]  [] 
xfs_itruncate_extents+0xe3/0x200
Mar 10 04:03:51 l28 kernel: [2075560.441131]  [] 
xfs_inactive+0x27c/0x3a0
Mar 10 04:03:51 l28 kernel: [2075560.441275]  [] ? 
wake_atomic_t_function+0x40/0x40
Mar 10 04:03:51 l28 kernel: [2075560.441386]  [] 
xfs_fs_evict_inode+0x73/0x80
Mar 10 04:03:51 l28 kernel: [2075560.441498]  [] 
evict+0xaa/0x1b0
Mar 10 04:03:51 l28 kernel: [2075560.441604]  [] 
iput+0x103/0x1a0
Mar 10 04:03:51 l28 kernel: [2075560.441713]  [] 
do_unlinkat+0x1cf/0x240
Mar 10 04:03:51 l28 kernel: [2075560.441823]  [] ? 
SyS_newfstatat+0x25/0x30
Mar 10 04:03:51 l28 kernel: [2075560.441932]  [] 
SyS_unlinkat+0x1d/0x40
Mar 10 04:03:51 l28 kernel: [2075560.442044]  [] 
system_call_fastpath+0x16/0x1b
Mar 10 04:03:51 l28 kernel: [2075560.442155] Code: 65 4c 03 04 25 48 bc 00 00 
49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 6b 48 85 c0 74 66 49 63 44 24 20 
48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 
74 bd 49 
Mar 10 04:03:51 l28 kernel: [2075560.442904] RIP  [] 
__kmalloc+0x69/0x100
Mar 10 04:03:51 l28 kernel: [2075560.443020]  RSP 
Mar 10 04:03:51 l28 kernel: [2075560.443122] CR2: 40008021
Mar 10 04:03:51 l28 kernel: [2075560.443809] ---[ end trace 92cd2d4bad1896f4 
]---

Kernel 3.12.51. Gdb listing:

(gdb) list *(__kmalloc+0x69)
0x810ee519 is in __kmalloc (mm/slub.c:260).
[...]
258 static inline void *get_freepointer(struct kmem_cache *s, void *object)
259 {
260 return *(void **)(object + s->offset);
261 }

What whould be the next step? Thank you.


-- 
UNIX/Ocaml engineer at 1Gb.ru. Telegram: johnlepikhin


RE: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9

2005-02-24 Thread Hugh Dickins
On Thu, 24 Feb 2005, Colin Harrison wrote:
> Fresher trace of my oops captured over a serial tty link (kernel now
> 2.6.11rc4-bk11) 
> 
> kernel BUG at mm/rmap.c:482!
> EIP is at page_remove_rmap+0x38/0x50
> Process cc1 (pid: 10543, threadinfo=d10e task=ced11a80)

Welcome to the exclusive club of those who've seen this -
check the archives for other sightings.

> > -Original Message-
> > From: Colin Harrison [mailto:[EMAIL PROTECTED] 
> > Sent: 21 February 2005 18:45
> > To: 'linux-kernel@vger.kernel.org'
> > Subject: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9

Sorry, that message never reached the list - marc hasn't got it.

[ I've snipped out a lot of info you were right to supply,
  thank you, though it'll probably turn out to be irrelevant. ]

> > More info can be supplied if required and patches compiled in 
> > to trace etc.

Thank you: please apply the patch below, and mail me back any
interesting messages you see - should be captured by dmesg, or
better /var/log/messages which will show the times too, to help
group them.  Not just the "Bad rmap:" errors this patch adds,
there might be relevant "Bad page state" errors or "swap_free:"
errors.  The system should stay up.  Perhaps you'll be swamped
with messages (don't spam the list with them if so, just me).

> > Note that I have run a memory checker (memtest-86 v3.2 boot 
> > CD) for ~ 3hours with no errors.

Right thing to try first, though may not have been long enough.
Since you've already tried that, let's try the patch below next.

> > I can usually repeat the crash while compiling a 
> > kernel..which after a reboot recovers and allows 'make 
> > bzImage' 'make modules' to finish!

If you can reproduce this fairly easily, that would be _wonderful_!
It's dragged on for months, but nobody sees it often enough to get
an understanding of it.  It's possible that most sightings are in
fact due to bad memory, and the rmap.c BUG a good (but tiresome)
checker itself - I do hope your info will give us more of a clue.

Thanks,
Hugh

p.s. Ammar, I've cc'ed you for interest as the most recent member
of the club; but this patch below is unlikely to apply cleanly to
your Fedora kernel - if you're used to moving patches from one
release to another, you might want to try it, but probably not.

--- 2.6.11-rc5/include/linux/rmap.h 2004-12-24 21:36:18.0 +
+++ linux/include/linux/rmap.h  2005-02-24 20:52:17.0 +
@@ -72,7 +72,7 @@ void __anon_vma_link(struct vm_area_stru
  */
 void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long);
 void page_add_file_rmap(struct page *);
-void page_remove_rmap(struct page *);
+void page_remove_rmap(struct page *, struct vm_area_struct *, unsigned long);
 
 /**
  * page_dup_rmap - duplicate pte mapping to a page
--- 2.6.11-rc5/mm/fremap.c  2005-02-24 20:11:11.0 +
+++ linux/mm/fremap.c   2005-02-24 20:52:17.0 +
@@ -37,7 +37,7 @@ static inline void zap_pte(struct mm_str
if (!PageReserved(page)) {
if (pte_dirty(pte))
set_page_dirty(page);
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma, addr);
page_cache_release(page);
mm->rss--;
}
--- 2.6.11-rc5/mm/memory.c  2005-02-24 20:11:11.0 +
+++ linux/mm/memory.c   2005-02-24 20:52:17.0 +
@@ -452,6 +452,7 @@ next_pgd:
 }
 
 static void zap_pte_range(struct mmu_gather *tlb,
+   struct vm_area_struct *vma,
pmd_t *pmd, unsigned long address,
unsigned long size, struct zap_details *details)
 {
@@ -517,7 +518,7 @@ static void zap_pte_range(struct mmu_gat
else if (pte_young(pte))
mark_page_accessed(page);
tlb->freed++;
-   page_remove_rmap(page);
+   page_remove_rmap(page, vma, address+offset);
tlb_remove_page(tlb, page);
continue;
}
@@ -535,6 +536,7 @@ static void zap_pte_range(struct mmu_gat
 }
 
 static void zap_pmd_range(struct mmu_gather *tlb,
+   struct vm_area_struct *vma,
pud_t *pud, unsigned long address,
unsigned long size, struct zap_details *details)
 {
@@ -553,13 +555,14 @@ static void zap_pmd_range(struct mmu_gat
if (end > ((address + PUD_SIZE) & PUD_MASK))
end = ((address + PUD_SIZE) & PUD_MASK);
do {
-   zap_pte_range(tlb, pmd, address, end - address, details);
+ 

RE: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9

2005-02-24 Thread Colin Harrison
Hi,
Fresher trace of my oops captured over a serial tty link (kernel now
2.6.11rc4-bk11) 

Kernel 2.6.11colin on an i686 / ttyS0
verbier.straightrunning.com login: [ cut here ]
kernel BUG at mm/rmap.c:482!
invalid operand:  [#1]
Modules linked in: parport_pc lp parport ipt_LOG ipt_REJECT ipt_state
ipt_limite
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010296   (2.6.11colin)
EIP is at page_remove_rmap+0x38/0x50
eax: f000   ebx: 00064000   ecx: c039878c   edx: c123ca00
esi: d2389f30   edi: 00098000   ebp: d10e0e4c   esp: d10e0e4c
ds: 007b   es: 007b   ss: 0068
Process cc1 (pid: 10543, threadinfo=d10e task=ced11a80)
Stack: d10e0e80 c013661b c12335c0 cef90904 0001  11e50067
c123ca00
   b7b68000 c039878c b7f68000 d1f10b7c b7c0 d10e0ea8 c01367a1
00098000
    c039878c b8241000 c039878c b7b68000 d1f10b7c b7c0
d10e0ed0
Call Trace:
 [] show_stack+0xa6/0xb0
 [] show_registers+0x149/0x1c0
 [] die+0xbb/0x150
 [] do_trap+0x7e/0xc0
 [] do_invalid_op+0xa5/0xb0
 [] error_code+0x2b/0x30
 [] zap_pte_range+0x13b/0x270
 [] zap_pmd_range+0x51/0x70
 [] zap_pud_range+0x3d/0x70
 [] unmap_page_range+0x67/0x80
 [] unmap_vmas+0xf4/0x1e0
 [] exit_mmap+0x68/0x130
 [] mmput+0x21/0x70
 [] exit_mm+0xbe/0xe0
 [] do_exit+0x9d/0x2c0
 [] do_group_exit+0x32/0x70
 [] sys_exit_group+0xf/0x20
 [] sysenter_past_esp+0x52/0x75
Code: 42 08 ff 0f 98 c0 84 c0 74 15 8b 42 08 40 78 1b ba ff ff ff ff b8 10
00 0

Entering kdb (current=0xced11a80, pid 10543) Oops: invalid operand
due to oops @ 0xc013cc28
eax = 0xf000 ebx = 0x00064000 ecx = 0xc039878c edx = 0xc123ca00
esi = 0xd2389f30 edi = 0x00098000 esp = 0xd10e0e4c eip = 0xc013cc28
ebp = 0xd10e0e4c xss = 0xc0210068 xcs = 0x0060 eflags = 0x00010296
xds = 0xc122007b xes = 0x007b origeax = 0x ®s = 0xd10e0e18
kdb>

Thanks

Colin Harrison

> -Original Message-
> From: Colin Harrison [mailto:[EMAIL PROTECTED] 
> Sent: 21 February 2005 18:45
> To: 'linux-kernel@vger.kernel.org'
> Subject: Kernel oops in mm/rmap.c in 2.6.11rc4-bk9
> 
> Hi
> 
> I've been getting a hang on my firewall machine for some 
> weeks when pushed.
> 
> Kernel freezes when heavily loaded either compiling or under 
> lots of web traffic.
> Machine is a Pentium III 450MHz i440BX Intel Mobo with 384MHz 
> memory running a firewall (iptables-1.3.0-200050220) via two 
> natsemi network cards.
> (old machine but should be good as firewall only...been solid 
> for years!) Kernel 2.6.11rc4-bk9 + supermount patch.
> 
> I have another machine which I am going to connect via serial 
> cable to help debug.
> (this has also been known to freeze on 2.6.11...a common 
> denominator is that I use Netgear natsemi net cards in 
> both...probably not significant?)
> 
> I've today built in KDB and get the following trace:-
> 
> In this case while compiling a kernel (make bzImage...running cc1)
> 
> Typed by hand:-
> 
> kernel BUG at mm/rmap.c:482!
> invalid operand:  [#1]
> Modules linked in: parport_pc lp parport ipt_LOG ipt_REJECT 
> ipt_state ipt_limit iptable_mangle iptable_filter ip_nat_ftp 
> iptable_nat ip_conntrack_ftp floppy natsemi supermount 
> intel_agp agpgart cs4243 ad1848 uart401 sound soundcore
> CPU:0
> EIP:0060:[]   Not tainted VLI
> EFLAGS: 00010296   (2.6.11colin)
> EIP is at page_remove_rmap+0x38/0x50
> eax: f000   ebx: 0002f000   ecx: c038d78c   edx: c123sd20
> esi: d0e7e44c   edi: 0009   ebp: d2a6ae4c   esp: d2a6ae4c
> ds: 007b   es: 007b   ss: 0068
> Process cc1 (pid: 10835, threadinfo=d2a6a000 task=cf57aaa0)
> Stack: d2a6ae80
> etc..
> 
> 
> 
> Entered kdb (current=0xcf57aaa0, pid 10835) Oops: invalid 
> operand due to oops @ 0xc013cc28 etc...
> 
> More info can be supplied if required and patches compiled in 
> to trace etc.
> Note that I have run a memory checker (memtest-86 v3.2 boot 
> CD) for ~ 3hours with no errors.
> I can usually repeat the crash while compiling a 
> kernel..which after a reboot recovers and allows 'make 
> bzImage' 'make modules' to finish!
> 
> Thanks
> Colin Harrison
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-18 Thread Jonathan Earle
Title: RE: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8







> -Original Message-
> From: Andrew Morton [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 15, 2000 11:47 PM
> To: Earle, Jonathan [KAN:1A31:EXCH]
> Cc: Linux MPLS List (E-mail); Linux Kernel List (E-mail)
> Subject: Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] 
> with test4-8
> 
> 
> > Jonathan Earle wrote:
> > 
> > Hi,
> > 
> > I've been having kernel oopses with the 2.4.0-test series and am
> > including ksymoops processed output from both test4 and test5
> > kernels.  The same oops happens in later kernels too (Tested with
> > test6, test7 and test8).
> > 
> 
> Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) 
> from within
> a softirq.  Hunt that down and turn it into GFP_ATOMIC.


Okay... Did that (turned all the GFP_KERNEL refereces in net/mpls to GFP_ATOMIC, and the problem seems to have gone away, I'll post a more confident summary when I'm more sure that things are working properly.

Now, what did I do (aside from fixing the problem) by changing that reference?


Many thanks for the hint!! 


Jon





Re: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Andrew Morton

> Jonathan Earle wrote:
> 
> Hi,
> 
> I've been having kernel oopses with the 2.4.0-test series and am
> including ksymoops processed output from both test4 and test5
> kernels.  The same oops happens in later kernels too (Tested with
> test6, test7 and test8).
> 

Presumably mpls_output() is doing a kmalloc(..., GFP_KERNEL) from within
a softirq.  Hunt that down and turn it into GFP_ATOMIC.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8

2000-09-15 Thread Jonathan Earle
Title: Kernel oops in mm/slab.c [ kmem_cache_grow() ] with test4-8





Hi, 


I've been having kernel oopses with the 2.4.0-test series and am including ksymoops processed output from both test4 and test5 kernels.  The same oops happens in later kernels too (Tested with test6, test7 and test8).

The scenario is this:


I have an incoming UDP stream at 1mbit.  The router marks packets in this stream, according to port ranges, with 3 (or any # of) marks (via iptables v1.1.1). iproute2 builds new routing tables based on these marks, and mplsadm, with the tc patch, is called to build LSPs using these routing tables.  Finally, the 3 egress LSPs are rate limited using tc (employing cbq classes) to a value less than the ingress rate (ie: I limited each LSP to 200kbit, for an aggregate egress output rate of 600kbit).  When I start the traffic flowing from our generator, the box panics and freezes quite solidly.  Policing via filters also crashes the box.  If I move the egress rate limiting function to another box, it works okay.

I've also noted that the crash only occurs if I throttle the traffic flow to an egress rate which is less than the ingress rate (ie: ingress flow at 1mbit and egress flow at 1mbit works fine.  If the egress rate is reduced, boom!)

I copied down the oopses and ran 'ksymoops < oops.txt > oops_proc.txt' and pasted them here.  The first is from kernel 2.4.0-test4 and the second from 2.4.0-test5.

NEW: Here's the funny part.  In mm/slab.c, the function kmem_cache_grow() contains a check as follows:


    /*
 * The test for missing atomic flag is performed here, rather than
 * the more obvious place, simply to reduce the critical path length
 * in kmem_cache_alloc(). If a caller is seriously mis-behaving they
 * will eventually be caught here (where it matters).
 */
 /* Commented out Sep 15 since it was crashing my router. */
 /* if (in_interrupt() && (flags & SLAB_LEVEL_MASK) != SLAB_ATOMIC)
 BUG(); */


This is the check that fails and causes the oops.  Not understanding what is actually being checked, and not knowing the repercussions of tampering with it, I commented out the check, recompiled and reran the test.  I understand that this is not really a fix (it's more akin to just turning my head and pretending that the problem doesn't exist, but... it seems to work.)  The result:  Great joy and much celebration!  I'm throwing 7.2mbps at the box, limiting the rate to 900kbit aggregate throughput and it's working!  The numbers I'm getting also seem to jive with anticipated results.

Cheers! 
Jon 


ksymoops 0.7c on i686 2.4.0-test4.  Options used 
 -V (default) 
 -k /proc/ksyms (default) 
 -l /proc/modules (default) 
 -o /lib/modules/2.4.0-test4/ (default) 
 -m /usr/src/linux/System.map (default) 


Warning: You did not tell me where to find symbol information.  I will 
assume that the log matches the kernel and modules that are running 
right now and I'll use the default options above for symbol resolution. 
If the current kernel and/or modules do not match the log, you can get 
more accurate output by telling me the kernel version and where to find 
map, modules, ksyms etc.  ksymoops -h explains the options. 


invalid operand:  
CPU: 0 
EIP: 0010:[] 
Using defaults from ksymoops -t elf32-i386 -a i386 
EFLAGS: 00010286 
eax: 001b ebx: c7ffd0c0 ecx:  edx: 0082 
esi: 0246 edi: c7ffd0c0 ebp: 0007 esp: c024fe70 
ds: 0018 es: 0018 ss: 0018 
Process swapper (pid:0, stackpage=c024f000) 
Stack: c01fb794 c01fb834 0412 c7ffd0c0 0247 0007 c024fed4 c7d1602e 
   c0127aaf c7ffd0c0 0007  c7d170e0 c7d1602e c01eb196 0008 
   0007  c7d170e0 c7d1602e c7f8be00  c01b6aaf c7d170e0 
Call trace: [][][][][][][] 
    [][][][][][][][] 
    [][][][][][][][] 
Code: 0f 0b 83 c4 0c c7 44 24 10 01 00 00 00 89 ee 83 e6 07 b8 03 


>>EIP; c01277fd    <= 
Trace; c01fb794  
Trace; c01fb834  
Trace; c0127aaf  
Trace; c01eb196  
Trace; c01b6aaf  
Trace; c01b6c6f  
Trace; c01b6a84  
Trace; c019b1c4  
Trace; c01b6936  
Trace; c01b6a84  
Trace; c019efe3  
Trace; c011b17f  
Trace; c010b8ee  
Trace; c01087e0  
Trace; c01087e0  
Trace; c010a518  
Trace; c01087e0  
Trace; c01087e0  
Trace; c0100018  
Trace; c0108803  
Trace; c0108864  
Trace; c0105000  
Trace; c0100192  
Code;  c01277fd  
 <_EIP>: 
Code;  c01277fd    <= 
   0:   0f 0b ud2a  <= 
Code;  c01277ff  
   2:   83 c4 0c  add    $0xc,%esp 
Code;  c0127802  
   5:   c7 44 24 10 01 00 00  movl   $0x1,0x10(%esp,1) 
Code;  c0127809  
   c:   00 
Code;  c012780a  
   d:   89 ee mov    %ebp,%esi 
Code;  c012780c  
   f:   83 e6 07  and    $0x7,%esi 
Code;  c012780f  
  12:   b8 03 00 00 00