Re: reproductible kernel oops with kernel 3.2 inside kvm

2012-05-17 Thread Josh Durgin

Hi Yann,

Sorry for the late response.

On 05/03/2012 07:05 AM, Yann Dupont wrote:

Hello. I'm stress testing ceph since some time now, with quite good
results. I really like ceph and will probably use in in some
pre-production services.

Anyway I've seen some bugs.

One of them is instability if the kernel is running inside KVM, leading
to a very fast (and reproductible) kernel oops. On bare metal this
particular oops doesn't happen.

The kernel oops itself involve ceph, but it could be a real bug in kvm too.

The host machine is runnning 3.2.2
kvm is quite ancien (0.14)
guest OS is ubuntu 12.04 with his standard kernel. Retried with custom
3.2 kernel with the same problem.


I'm not sure how many people are using the kernel client within kvm,
but I haven't seen this problem before. Since it's in d_prune, it's
probably Ceph related, but perhaps kvm makes a race condition trigger
more often in your environment.

I filed http://tracker.newdream.net/issues/2444 to track this.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


reproductible kernel oops with kernel 3.2 inside kvm

2012-05-03 Thread Yann Dupont
Hello. I'm stress testing ceph since some time now, with quite good 
results. I really like ceph and will probably use in in some 
pre-production services.


Anyway I've seen some bugs.

One of them is instability if the kernel is running inside KVM, leading 
to a very fast (and reproductible) kernel oops. On bare metal this 
particular oops doesn't happen.


The kernel oops itself involve ceph, but it could be a real bug in kvm too.

The host machine is runnning 3.2.2
kvm is quite ancien (0.14)
guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 
3.2 kernel with the same problem.



I'm using ceph using mount -t ceph mon_adress:/ /mnt/temp

A simple recursive copy of /home lead to this kernel oops:

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675559] BUG: 
unable to handle kernel NULL pointer dereference at   (null)
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675569] IP: 
[f8379d8d] ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675579] *pde = 

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675583] Oops: 
0002 [#1] SMP
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675587] 
Modules linked in: ceph libceph libcrc32c zram(C) parport_pc rfcomm 
ppdev bnep lp bluetooth parport dm_crypt binfmt_misc psmouse mac_hid 
virtio_balloon serio_raw i2c_piix4 nf_conntrack_ipv6 nf_conntrack 
nf_defrag_ipv6 floppy

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675605]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675609] Pid: 
27, comm: kswapd0 Tainted: G S  WC   3.2.0-24-generic #37-Ubuntu 
Bochs Bochs
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675614] EIP: 
0060:[f8379d8d] EFLAGS: 00010282 CPU: 0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675618] EIP is 
at ceph_d_prune+0x1d/0x30 [ceph]
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675621] EAX: 
 EBX: ed311480 ECX: cdf35a4c EDX: cdf35a00
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675623] ESI: 
ed3114e0 EDI: c8de4ccc EBP: f3e4bdec ESP: f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675625]  DS: 
007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675628] 
Process kswapd0 (pid: 27, ti=f3e4a000 task=f3d45860 task.ti=f3e4a000)

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675630] Stack:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675632] 
f3e4bdfc c114503e ed311480 cdf35a00 f3e4be28 c1146bdf c8de5764 ed31164c
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675638] 
cdf35a4c f3e4be44 ed3114cc ed17dbe0 f1b5ac00 f1b5ac80 eafb38e0 f3e4be58
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675644] 
c114763e eafb38cc  f3e4be3c f3e4be3c f3e4be3c c91a5e60 ed3114e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675685] Call 
Trace:
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675769] 
[c114503e] dentry_lru_prune+0x6e/0x70
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675774] 
[c1146bdf] shrink_dentry_list+0x14f/0x270
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675777] 
[c114763e] prune_dcache_sb+0x10e/0x130
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675786] 
[c113584a] prune_super+0xfa/0x160
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675790] 
[c10f6056] shrink_slab+0x166/0x2e0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675793] 
[c10f7c47] ? shrink_zone+0x137/0x190
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675796] 
[c10f8074] balance_pgdat+0x3d4/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675800] 
[c10f82d1] kswapd+0xf1/0x1b0
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675803] 
[c10f81e0] ? balance_pgdat+0x540/0x540
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675812] 
[c1069b8d] kthread+0x6d/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675815] 
[c1069b20] ? flush_kthread_worker+0x80/0x80
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675828] 
[c157e37e] kernel_thread_helper+0x6/0x10
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675830] Code: 
e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 
10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c f0 80 20 fd 
5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675861] EIP: 
[f8379d8d] ceph_d_prune+0x1d/0x30 [ceph] SS:ESP 0068:f3e4bdec
May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675867] CR2: 

May  3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675872] ---[ 
end trace a7919e7f17c0a727 ]---




Retried on another machine with kvm 1.0 :

May  3 15:59:43 xs1.u13.univ-nantes.prive kernel: [  178.962997] BUG: 
unable to handle kernel NULL pointer dereference at   (null)
May  3 15:59:43