Re: Ceph kernel client - kernel craches

2012-05-17 Thread Josh Durgin

Sorry your mail fell through the cracks before. I filed
http://tracker.newdream.net/issues/2445 to track the ceph-related
crashes. Alex, do you think the first crash is related to ceph at all?

Josh

On 05/10/2012 11:00 AM, Giorgos Kappes wrote:

Sorry for my late response. I reproduced the above bug with the Linux
kernel 3.3.4 and without using XEN:

uname -a
Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux

The trace is shown below:


[  763.984023] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  763.984177] BUG: unable to handle kernel paging request at 880037bd0800
[  763.984402] IP: [880037bd0800] 0x880037bd07ff
[  763.984568] PGD 1806063 PUD 180a063 PMD 800037a001e3
[  763.984845] Oops: 0011 [#1] SMP
[  763.985058] CPU 3
[  763.985124] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  763.988002]
[  763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
[  763.988002] RIP: 0010:[880037bd0800]  [880037bd0800]
0x880037bd07ff
[  763.988002] RSP: 0018:8800bfcc3e78  EFLAGS: 00010292
[  763.988002] RAX: 8800b97745b0 RBX: 8800bfcce770 RCX: 880037bd0800
[  763.988002] RDX: 880037bd1600 RSI: b9b6a040 RDI: 880037bd1600
[  763.988002] RBP: 81820080 R08: 8800b9dd0b00 R09: 00018020001c
[  763.988002] R10: 8020001c R11: 816075c0 R12: 8800bfcce7a0
[  763.988002] R13: 8800b97745b0 R14: 0003 R15: 000a
[  763.988002] FS:  () GS:8800bfcc()
knlGS:
[  763.988002] CS:  0010 DS:  ES:  CR0: 8005003b
[  763.988002] CR2: 880037bd0800 CR3: b895b000 CR4: 06e0
[  763.988002] DR0:  DR1:  DR2: 
[  763.988002] DR3:  DR6: 0ff0 DR7: 0400
[  763.988002] Process swapper/3 (pid: 0, threadinfo 8800bbae,
task 8800bbad8000)
[  763.988002] Stack:
[  763.988002]  8109b44d 8800bbacd820 8800b97745b0
8800bbae0010
[  763.988002]  8800bbad8000 8800bfcc3ea0 0048
8800bbae1fd8
[  763.988002]  0100 0001 0009
8800bbae1fd8
[  763.988002] Call Trace:
[  763.988002]IRQ
[  763.988002]  [8109b44d] ? __rcu_process_callbacks+0x1e9/0x335
[  763.988002]  [8109b8fb] ? rcu_process_callbacks+0x2c/0x56
[  763.988002]  [8103e3b1] ? __do_softirq+0xc4/0x1a0
[  763.988002]  [8102515b] ? lapic_next_event+0x18/0x1d
[  763.988002]  [815d3b1c] ? call_softirq+0x1c/0x30
[  763.988002]  [8100fba3] ? do_softirq+0x3f/0x79
[  763.988002]  [8103e186] ? irq_exit+0x44/0xb1
[  763.988002]  [81025c61] ? smp_apic_timer_interrupt+0x85/0x93
[  763.988002]  [815d311e] ? apic_timer_interrupt+0x6e/0x80
[  763.988002]EOI
[  763.988002]  [810145e1] ? native_sched_clock+0x28/0x33
[  763.988002]  [810152f6] ? mwait_idle+0x8c/0xbc
[  763.988002]  [810152ae] ? mwait_idle+0x44/0xbc
[  763.988002]  [8100de94] ? cpu_idle+0xb9/0xf7
[  763.988002]  [815c43c6] ? start_secondary+0x270/0x275
[  763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00
88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00   16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42
b9 00
[  763.988002] RIP  [880037bd0800] 0x880037bd07ff
[  763.988002]  RSP8800bfcc3e78
[  763.988002] CR2: 880037bd0800
[  763.988002] ---[ end trace 614049dc850267ac ]---
[  763.988002] Kernel panic - not syncing: Fatal exception in interrupt
[  763.997833] [ cut here ]
[  763.997936] WARNING: at arch/x86/kernel/smp.c:120
update_process_times+0x57/0x63()
[  763.998072] Hardware name: ProLiant DL160 G5
[  763.998171] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  764.001205] Pid: 0, comm: swapper/3 Tainted: G  D  3.3.4 #1
[  764.001311] Call Trace:
[  764.001404]IRQ[81038bb0] ? warn_slowpath_common+0x78/0x8c
[  764.001573]  [81044937] ? update_process_times+0x57/0x63
[  

Re: Ceph kernel client - kernel craches

2012-05-10 Thread Giorgos Kappes
Sorry for my late response. I reproduced the above bug with the Linux
kernel 3.3.4 and without using XEN:

uname -a
Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux

The trace is shown below:


[  763.984023] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  763.984177] BUG: unable to handle kernel paging request at 880037bd0800
[  763.984402] IP: [880037bd0800] 0x880037bd07ff
[  763.984568] PGD 1806063 PUD 180a063 PMD 800037a001e3
[  763.984845] Oops: 0011 [#1] SMP
[  763.985058] CPU 3
[  763.985124] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  763.988002]
[  763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5
[  763.988002] RIP: 0010:[880037bd0800]  [880037bd0800]
0x880037bd07ff
[  763.988002] RSP: 0018:8800bfcc3e78  EFLAGS: 00010292
[  763.988002] RAX: 8800b97745b0 RBX: 8800bfcce770 RCX: 880037bd0800
[  763.988002] RDX: 880037bd1600 RSI: b9b6a040 RDI: 880037bd1600
[  763.988002] RBP: 81820080 R08: 8800b9dd0b00 R09: 00018020001c
[  763.988002] R10: 8020001c R11: 816075c0 R12: 8800bfcce7a0
[  763.988002] R13: 8800b97745b0 R14: 0003 R15: 000a
[  763.988002] FS:  () GS:8800bfcc()
knlGS:
[  763.988002] CS:  0010 DS:  ES:  CR0: 8005003b
[  763.988002] CR2: 880037bd0800 CR3: b895b000 CR4: 06e0
[  763.988002] DR0:  DR1:  DR2: 
[  763.988002] DR3:  DR6: 0ff0 DR7: 0400
[  763.988002] Process swapper/3 (pid: 0, threadinfo 8800bbae,
task 8800bbad8000)
[  763.988002] Stack:
[  763.988002]  8109b44d 8800bbacd820 8800b97745b0
8800bbae0010
[  763.988002]  8800bbad8000 8800bfcc3ea0 0048
8800bbae1fd8
[  763.988002]  0100 0001 0009
8800bbae1fd8
[  763.988002] Call Trace:
[  763.988002]  IRQ
[  763.988002]  [8109b44d] ? __rcu_process_callbacks+0x1e9/0x335
[  763.988002]  [8109b8fb] ? rcu_process_callbacks+0x2c/0x56
[  763.988002]  [8103e3b1] ? __do_softirq+0xc4/0x1a0
[  763.988002]  [8102515b] ? lapic_next_event+0x18/0x1d
[  763.988002]  [815d3b1c] ? call_softirq+0x1c/0x30
[  763.988002]  [8100fba3] ? do_softirq+0x3f/0x79
[  763.988002]  [8103e186] ? irq_exit+0x44/0xb1
[  763.988002]  [81025c61] ? smp_apic_timer_interrupt+0x85/0x93
[  763.988002]  [815d311e] ? apic_timer_interrupt+0x6e/0x80
[  763.988002]  EOI
[  763.988002]  [810145e1] ? native_sched_clock+0x28/0x33
[  763.988002]  [810152f6] ? mwait_idle+0x8c/0xbc
[  763.988002]  [810152ae] ? mwait_idle+0x44/0xbc
[  763.988002]  [8100de94] ? cpu_idle+0xb9/0xf7
[  763.988002]  [815c43c6] ? start_secondary+0x270/0x275
[  763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00
88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42
b9 00
[  763.988002] RIP  [880037bd0800] 0x880037bd07ff
[  763.988002]  RSP 8800bfcc3e78
[  763.988002] CR2: 880037bd0800
[  763.988002] ---[ end trace 614049dc850267ac ]---
[  763.988002] Kernel panic - not syncing: Fatal exception in interrupt
[  763.997833] [ cut here ]
[  763.997936] WARNING: at arch/x86/kernel/smp.c:120
update_process_times+0x57/0x63()
[  763.998072] Hardware name: ProLiant DL160 G5
[  763.998171] Modules linked in: cbc netconsole loop snd_pcm
snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac
tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys
button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod
cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core
usbcore usb_common tg3 libphy mptsas mptscsih mptbase
scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan]
[  764.001205] Pid: 0, comm: swapper/3 Tainted: G      D      3.3.4 #1
[  764.001311] Call Trace:
[  764.001404]  IRQ  [81038bb0] ? warn_slowpath_common+0x78/0x8c
[  764.001573]  [81044937] ? update_process_times+0x57/0x63
[  764.001681]  [81075dbe] ? tick_sched_timer+0x65/0x8b
[  764.001788]  [810561bd] ? __run_hrtimer+0xb2/0x13d
[  764.001832]  [81013ca9] ? read_tsc+0x5/0x16
[  764.001832]  [81056482] ? hrtimer_interrupt+0xd8/0x1a7
[  

Ceph kernel client - kernel craches

2012-05-08 Thread Giorgos Kappes
hi,

When I am running deboostrap to install a base Debian Squeeze system
on a Ceph directory the client's kernel crashes with the following
message:

I: Retrieving Release
I: Validating Packages
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Found additional required dependencies: insserv libbz2-1.0 libdb4.8 libslang2
I: Found additional base dependencies: libnfnetlink0 libsqlite3-0
I: Checking component main on http://ftp.us.debian.org/debian...
I: Validating libacl1
...
I: Extracting xz-utils...
I: Extracting zlib1g...
W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc
[  759.776151] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  759.776169] BUG: unable to handle kernel paging request at e8fe4ab0
[  759.776182] IP: [e8fe4ab0] 0xe8fe4aaf
[  759.776195] PGD c42b067 PUD c42c067 PMD c42d067 PTE 80100c445067
[  759.776209] Oops: 0011 [#1] SMP
[  759.776219] CPU 0
[  759.776224] Modules linked in: pcspkr [last unloaded: scsi_wait_scan]
[  759.776237]
[  759.776244] Pid: 0, comm: swapper/0 Tainted: GW3.2.11 #2
[  759.776255] RIP: e030:[e8fe4ab0]  [e8fe4ab0]
0xe8fe4aaf
[  759.776267] RSP: e02b:88001ffaae98  EFLAGS: 00010296
[  759.776274] RAX: 880012d7a900 RBX: 88001ffb5960 RCX: e8fe4ab0
[  759.776302] RDX: 88000d1a9b00 RSI: 000f RDI: 88000d1a9b00
[  759.776309] RBP: 81c1fa80 R08: 88001eb74000 R09: 0001801f
[  759.776317] R10: 801f R11: 818055f5 R12: 88001ffb5990
[  759.776324] R13: 88000c5ea880 R14: 0001 R15: 000a
[  759.776334] FS:  7f21095a4740() GS:88001ffa7000()
knlGS:
[  759.776342] CS:  e033 DS:  ES:  CR0: 8005003b
[  759.776349] CR2: e8fe4ab0 CR3: 12e28000 CR4: 2660
[  759.776356] DR0:  DR1:  DR2: 
[  759.776364] DR3:  DR6: 0ff0 DR7: 0400
[  759.776372] Process swapper/0 (pid: 0, threadinfo 81c0,
task 81c0d020)
[  759.776379] Stack:
[  759.776384]  81099405 0001 880012d7a900
88001ffaaeb0
[  759.776397]  0048 81c01fd8 0100
0001
[  759.776409]  0009 81c01fd8 81099898
81c01fd8
[  759.776422] Call Trace:
[  759.776427]  IRQ
[  759.776438]  [81099405] ? __rcu_process_callbacks+0x1c7/0x2f8
[  759.776447]  [81099898] ? rcu_process_callbacks+0x2c/0x56
[  759.776457]  [8104cb72] ? __do_softirq+0xc4/0x1a0
[  759.776465]  [81096875] ? handle_percpu_irq+0x3d/0x54
[  759.776475]  [8150efb6] ? __xen_evtchn_do_upcall+0x1c7/0x205
[  759.776484]  [8176e52c] ? call_softirq+0x1c/0x30
[  759.776493]  [8100fa47] ? do_softirq+0x3f/0x79
[  759.776501]  [8104c942] ? irq_exit+0x44/0xb5
[  759.776508]  [8150ffc6] ? xen_evtchn_do_upcall+0x27/0x32
[  759.776516]  [8176e57e] ? xen_do_hypervisor_callback+0x1e/0x30
[  759.776523]  EOI
[  759.776531]  [81006f3f] ? xen_restore_fl_direct_reloc+0x4/0x4
[  759.776539]  [810013aa] ? hypercall_page+0x3aa/0x1000
[  759.776547]  [810013aa] ? hypercall_page+0x3aa/0x1000
[  759.776556]  [8163969b] ? cpuidle_idle_call+0x16/0x1af
[  759.776564]  [810068dc] ? xen_safe_halt+0xc/0x15
[  759.776572]  [810150a6] ? default_idle+0x4b/0x84
[  759.776580]  [8100ddf6] ? cpu_idle+0xb9/0xef
[  759.776588]  [81cf7bff] ? start_kernel+0x395/0x3a0
[  759.776596]  [81cfa536] ? xen_start_kernel+0x593/0x598
[  759.776602] Code: e8 ff ff 80 4a fe ff ff e8 ff ff 0b 00 00 00 01
00 00 00 fa ff ff ff fa ff ff ff 06 00 00 00 02 00 00 00 05 00 00 00
cc cc cc cc 00 9b 1a 0d 00 88 ff ff 00 0f b7 1e 00 88 ff ff 01 00 00
00 00
[  759.776699] RIP  [e8fe4ab0] 0xe8fe4aaf
[  759.776712]  RSP 88001ffaae98
[  759.776717] CR2: e8fe4ab0
[  759.776725] ---[ end trace 36924001333caa12 ]---
[  759.776731] Kernel panic - not syncing: Fatal exception in interrupt
[  759.776739] Pid: 0, comm: swapper/0 Tainted: G  D W3.2.11 #2
[  759.776745] Call Trace:
[  759.776749]  IRQ  [81764003] ? panic+0x92/0x1a0
[  759.776771]  [810478c0] ? kmsg_dump+0x41/0xdd
[  759.776779]  [81766cc1] ? oops_end+0xa9/0xb6
[  759.776788]  [8102ec7d] ? no_context+0x1ff/0x20c
[  759.776795]  [81768d9f] ? do_page_fault+0x1ad/0x34c
[  759.776805]  [8106dfb3] ? tick_nohz_handler+0xcb/0xcb
[  759.776813]  [8102c12a] ? pvclock_clocksource_read+0x46/0xb4
[  759.776821]  [81006eb3] ? xen_vcpuop_set_next_event+0x4d/0x61
[  759.776829]  [8106cdcc] ? clockevents_program_event+0x99/0xb8
[  759.776837]  [817663b5] ? page_fault+0x25/0x30
[  759.776845]