Re: reproductible kernel oops with kernel 3.2 inside kvm
Hi Yann, Sorry for the late response. On 05/03/2012 07:05 AM, Yann Dupont wrote: Hello. I'm stress testing ceph since some time now, with quite good results. I really like ceph and will probably use in in some pre-production services. Anyway I've seen some bugs. One of them is instability if the kernel is running inside KVM, leading to a very fast (and reproductible) kernel oops. On bare metal this particular oops doesn't happen. The kernel oops itself involve ceph, but it could be a real bug in kvm too. The host machine is runnning 3.2.2 kvm is quite ancien (0.14) guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 3.2 kernel with the same problem. I'm not sure how many people are using the kernel client within kvm, but I haven't seen this problem before. Since it's in d_prune, it's probably Ceph related, but perhaps kvm makes a race condition trigger more often in your environment. I filed http://tracker.newdream.net/issues/2444 to track this. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
reproductible kernel oops with kernel 3.2 inside kvm
Hello. I'm stress testing ceph since some time now, with quite good results. I really like ceph and will probably use in in some pre-production services. Anyway I've seen some bugs. One of them is instability if the kernel is running inside KVM, leading to a very fast (and reproductible) kernel oops. On bare metal this particular oops doesn't happen. The kernel oops itself involve ceph, but it could be a real bug in kvm too. The host machine is runnning 3.2.2 kvm is quite ancien (0.14) guest OS is ubuntu 12.04 with his standard kernel. Retried with custom 3.2 kernel with the same problem. I'm using ceph using mount -t ceph mon_adress:/ /mnt/temp A simple recursive copy of /home lead to this kernel oops: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675559] BUG: unable to handle kernel NULL pointer dereference at (null) May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675569] IP: [f8379d8d] ceph_d_prune+0x1d/0x30 [ceph] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675579] *pde = May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675583] Oops: 0002 [#1] SMP May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675587] Modules linked in: ceph libceph libcrc32c zram(C) parport_pc rfcomm ppdev bnep lp bluetooth parport dm_crypt binfmt_misc psmouse mac_hid virtio_balloon serio_raw i2c_piix4 nf_conntrack_ipv6 nf_conntrack nf_defrag_ipv6 floppy May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675605] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675609] Pid: 27, comm: kswapd0 Tainted: G S WC 3.2.0-24-generic #37-Ubuntu Bochs Bochs May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675614] EIP: 0060:[f8379d8d] EFLAGS: 00010282 CPU: 0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675618] EIP is at ceph_d_prune+0x1d/0x30 [ceph] May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675621] EAX: EBX: ed311480 ECX: cdf35a4c EDX: cdf35a00 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675623] ESI: ed3114e0 EDI: c8de4ccc EBP: f3e4bdec ESP: f3e4bdec May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675625] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675628] Process kswapd0 (pid: 27, ti=f3e4a000 task=f3d45860 task.ti=f3e4a000) May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675630] Stack: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675632] f3e4bdfc c114503e ed311480 cdf35a00 f3e4be28 c1146bdf c8de5764 ed31164c May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675638] cdf35a4c f3e4be44 ed3114cc ed17dbe0 f1b5ac00 f1b5ac80 eafb38e0 f3e4be58 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675644] c114763e eafb38cc f3e4be3c f3e4be3c f3e4be3c c91a5e60 ed3114e0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675685] Call Trace: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675769] [c114503e] dentry_lru_prune+0x6e/0x70 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675774] [c1146bdf] shrink_dentry_list+0x14f/0x270 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675777] [c114763e] prune_dcache_sb+0x10e/0x130 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675786] [c113584a] prune_super+0xfa/0x160 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675790] [c10f6056] shrink_slab+0x166/0x2e0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675793] [c10f7c47] ? shrink_zone+0x137/0x190 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675796] [c10f8074] balance_pgdat+0x3d4/0x540 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675800] [c10f82d1] kswapd+0xf1/0x1b0 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675803] [c10f81e0] ? balance_pgdat+0x540/0x540 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675812] [c1069b8d] kthread+0x6d/0x80 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675815] [c1069b20] ? flush_kthread_worker+0x80/0x80 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675828] [c157e37e] kernel_thread_helper+0x6/0x10 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675830] Code: e5 3e 8d 74 26 00 b8 01 00 00 00 5d c3 90 55 89 e5 3e 8d 74 26 00 8b 50 10 85 d2 74 12 39 d0 74 0e 8b 40 0c 85 c0 74 07 8b 42 5c f0 80 20 fd 5d c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675861] EIP: [f8379d8d] ceph_d_prune+0x1d/0x30 [ceph] SS:ESP 0068:f3e4bdec May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675867] CR2: May 3 14:26:26 xs1.u13.univ-nantes.prive kernel: [222696.675872] ---[ end trace a7919e7f17c0a727 ]--- Retried on another machine with kvm 1.0 : May 3 15:59:43 xs1.u13.univ-nantes.prive kernel: [ 178.962997] BUG: unable to handle kernel NULL pointer dereference at (null) May 3 15:59:43