Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev
On Fri, 2011-05-13 at 16:34 +0300, Apollon Oikonomopoulos wrote: [...] Having upgraded from lenny to squeeze last week, we encountered the following crash during a SCSI bus rescan that added new disks to a system: [ 1258.343275] [ cut here ] [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 1258.343287] kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539! [ 1258.343289] invalid opcode: [#2] SMP [ 1258.343292] last sysfs file: /sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable [ 1258.343295] CPU 4 [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan] [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G D W 2.6.32-5-amd64 #1 ProLiant BL460c G1 [...] We really need to see the first BUG message after boot. The 'D' here indicates that this is not the first. Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. signature.asc Description: This is a digitally signed message part
Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev
On 15:07 Sun 15 May , Ben Hutchings wrote: On Fri, 2011-05-13 at 16:34 +0300, Apollon Oikonomopoulos wrote: [...] Having upgraded from lenny to squeeze last week, we encountered the following crash during a SCSI bus rescan that added new disks to a system: [ 1258.343275] [ cut here ] [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 1258.343287] kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539! [ 1258.343289] invalid opcode: [#2] SMP [ 1258.343292] last sysfs file: /sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable [ 1258.343295] CPU 4 [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan] [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G D W 2.6.32-5-amd64 #1 ProLiant BL460c G1 [...] We really need to see the first BUG message after boot. The 'D' here indicates that this is not the first. Ben. Hi Ben, You're right, this was not the first occurence. I was trying to forcibly reproduce the problem by adding and removing ~1.5k SCSI disks to the system, which caused a lot of WARNINGS like the following to appear first: May 12 18:13:27 hn-11 kernel: [ 513.803626] [ cut here ] May 12 18:13:27 hn-11 kernel: [ 513.803636] WARNING: at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/sysfs.h:139 __sysfs_get+0x20/0x28() May 12 18:13:27 hn-11 kernel: [ 513.803639] Hardware name: ProLiant BL460c G1 May 12 18:13:27 hn-11 kernel: [ 513.803641] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm snd_pcm drm_kms_helper snd_timer drm i2c_algo_bit snd soundcore snd_page_alloc i2c_core hpwdt i5k_amb ipmi_si i5000_edac pcspkr rng_core ipmi_msghandler hpilo psmouse edac_core serio_raw evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc cciss ehci_hcd tg3 libphy usbcore scsi_tgt bnx2 nls_base scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan] May 12 18:13:27 hn-11 kernel: [ 513.803725] Pid: 24914, comm: async/17 Not tainted 2.6.32-5-amd64 #1 May 12 18:13:27 hn-11 kernel: [ 513.803727] Call Trace: May 12 18:13:27 hn-11 kernel: [ 513.803733] [8113efad] ? __sysfs_get+0x20/0x28 May 12 18:13:27 hn-11 kernel: [ 513.803736] [8113efad] ? __sysfs_get+0x20/0x28 May 12 18:13:27 hn-11 kernel: [ 513.803740] [8104db34] ? warn_slowpath_common+0x77/0xa3 May 12 18:13:27 hn-11 kernel: [ 513.803744] [8113efad] ? __sysfs_get+0x20/0x28 May 12 18:13:27 hn-11 kernel: [ 513.803747] [8113f0f1] ? __sysfs_add_one+0x2b/0x84 May 12 18:13:27 hn-11 kernel: [ 513.803750] [8113f1a0] ? sysfs_add_one+0x19/0xe4 May 12 18:13:27 hn-11 kernel: [ 513.803754] [8113ec39] ? sysfs_add_file_mode+0x4e/0x7f May 12 18:13:27 hn-11 kernel: [ 513.803759] [811761aa] ? elv_register_queue+0x4f/0x6f May 12 18:13:27 hn-11 kernel: [ 513.803764] [8118024b] ? blk_register_queue+0x7f/0xcc May 12 18:13:27 hn-11 kernel: [ 513.803768] [81184021] ? add_disk+0xb8/0x108 May 12 18:13:27 hn-11 kernel: [ 513.803776] [a01566be] ? sd_probe_async+0x119/0x1d8 [sd_mod] May 12 18:13:27 hn-11 kernel: [ 513.803781] [810698a7] ? async_thread+0x0/0x20d May 12 18:13:27 hn-11 kernel: [ 513.803784] [810699a6] ? async_thread+0xff/0x20d May 12 18:13:27 hn-11 kernel: [ 513.803789] [81049fee] ? default_wake_function+0x0/0x9 May 12 18:13:27 hn-11 kernel: [ 513.803792] [810698a7] ? async_thread+0x0/0x20d May 12 18:13:27 hn-11 kernel: [ 513.803795] [81064721] ? kthread+0x79/0x81 May 12 18:13:27 hn-11 kernel: [ 513.803800] [81011baa] ? child_rip+0xa/0x20 May 12 18:13:27 hn-11 kernel: [ 513.803803] [810646a8] ? kthread+0x0/0x81 May 12 18:13:27 hn-11 kernel: [
Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev
Package: linux-2.6 Version: 2.6.32-31 Severity: normal Hi, We are experiencing the following problem on a number of machines using 2.6.32-5-amd64_2.6.32-31. The machines are used for virtual machine hosting and have a number of LUNs exported from an FC-connected SAN connected to them via a multipath topology. Our regular workflow involves hot-removing and hot-adding disks according to the VMs hosted. For the LUNs exported by the SAN storage, we have the following udev rule in place: -8- # Set all netapp LUN schedulers to noop # Skip partitions KERNEL==*[0-9], GOTO=lunsched_end # Set scheduler ACTION==add, SUBSYSTEM==block, ATTRS{vendor}==NETAPP, ATTRS{model}==LUN,ATTR{queue/scheduler}=noop LABEL=lunsched_end -8- Having upgraded from lenny to squeeze last week, we encountered the following crash during a SCSI bus rescan that added new disks to a system: [ 1258.343275] [ cut here ] [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 1258.343287] kernel BUG at /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539! [ 1258.343289] invalid opcode: [#2] SMP [ 1258.343292] last sysfs file: /sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable [ 1258.343295] CPU 4 [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan] [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G D W 2.6.32-5-amd64 #1 ProLiant BL460c G1 [ 1258.343335] RIP: 0010:[8113ecff] [8113ecff] sysfs_create_file+0x13/0x21 [ 1258.343340] RSP: 0018:8803ffd3fdd8 EFLAGS: 00010246 [ 1258.343342] RAX: RBX: 81485598 RCX: 589a [ 1258.343344] RDX: 81476c38 RSI: 81485598 RDI: [ 1258.343347] RBP: 88041afaba90 R08: R09: 813ad975 [ 1258.343349] R10: fff4 R11: 000186a0 R12: [ 1258.343351] R13: R14: 8804264d4458 R15: [ 1258.343354] FS: () GS:88000fd0() knlGS: [ 1258.343356] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 1258.343359] CR2: 7f541f42 CR3: 0003fddee000 CR4: 000426e0 [ 1258.343361] DR0: DR1: DR2: [ 1258.343363] DR3: DR6: 0ff0 DR7: 0400 [ 1258.343366] Process async/20 (pid: 12287, threadinfo 8803ffd3e000, task 8804273f1530) [ 1258.343368] Stack: [ 1258.343369] 811761aa 8804078c6418 8804078c60f0 [ 1258.343372] 0 8118024b 8804264d4400 8804264d4400 8804083ba800 [ 1258.343375] 0 8804083ba928 0001 81184021 8804264d4400 [ 1258.343378] Call Trace: [ 1258.343381] [811761aa] ? elv_register_queue+0x4f/0x6f [ 1258.343385] [8118024b] ? blk_register_queue+0x7f/0xcc [ 1258.343388] [81184021] ? add_disk+0xb8/0x108 [ 1258.343393] [a01506be] ? sd_probe_async+0x119/0x1d8 [sd_mod] [ 1258.343396] [810698a7] ? async_thread+0x0/0x20d [ 1258.343399] [810699a6] ? async_thread+0xff/0x20d [ 1258.343403] [81049fee] ? default_wake_function+0x0/0x9 [ 1258.343406] [810698a7] ? async_thread+0x0/0x20d [ 1258.343408] [81064721] ? kthread+0x79/0x81 [ 1258.343411] [81011baa] ? child_rip+0xa/0x20 [ 1258.343414] [810646a8] ? kthread+0x0/0x81 [ 1258.343416] [81011ba0] ? child_rip+0x0/0x20 [ 1258.343418] Code: 74 0f 48 89 ef e8 24 07 00 00 eb 05 bb fe ff ff ff 89 d8 5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30 48 85 ff 74 05 48 85 f6 75 04 0f 0b eb fe ba 02 00 00 00 e9 5d ff ff ff 55 53 48 89 fb 48 c7 [ 1258.343437] RIP [8113ecff] sysfs_create_file+0x13/0x21 [ 1258.343440] RSP 8803ffd3fdd8 [ 1258.343443] ---[ end trace eeb541477f3e4233 ]--- Apart from this BUG, there are also sporadic warnings, like: [ 324.454044] kobject_add_internal failed for hÝ$iosched with -EEXIST, don't try to register things with the