Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev

2011-05-15 Thread Ben Hutchings
On Fri, 2011-05-13 at 16:34 +0300, Apollon Oikonomopoulos wrote:
[...]
 Having upgraded from lenny to squeeze last week, we encountered the following
 crash during a SCSI bus rescan that added new disks to a system:
 
 [ 1258.343275] [ cut here ]
 [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: 
 enabled, doesn't support DPO or FUA
 [ 1258.343287] kernel BUG at 
 /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
 [ 1258.343289] invalid opcode:  [#2] SMP
 [ 1258.343292] last sysfs file: 
 /sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable
 [ 1258.343295] CPU 4
 [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 
 ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 
 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
 garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
 ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler 
 i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore 
 snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp 
 container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod 
 crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 
 scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last 
 unloaded: scsi_wait_scan]
 [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G  D W  2.6.32-5-amd64 
 #1 ProLiant BL460c G1
[...]

We really need to see the first BUG message after boot.  The 'D' here
indicates that this is not the first.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


signature.asc
Description: This is a digitally signed message part


Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev

2011-05-15 Thread Apollon Oikonomopoulos
On 15:07 Sun 15 May , Ben Hutchings wrote:
 On Fri, 2011-05-13 at 16:34 +0300, Apollon Oikonomopoulos wrote:
 [...]
  Having upgraded from lenny to squeeze last week, we encountered the 
  following
  crash during a SCSI bus rescan that added new disks to a system:
  
  [ 1258.343275] [ cut here ]
  [ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: 
  enabled, doesn't support DPO or FUA
  [ 1258.343287] kernel BUG at 
  /tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
  [ 1258.343289] invalid opcode:  [#2] SMP
  [ 1258.343292] last sysfs file: 
  /sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable
  [ 1258.343295] CPU 4
  [ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 
  ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 
  nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 
  8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh 
  ipmi_poweroff ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm 
  ipmi_msghandler i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac 
  snd soundcore snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr 
  rng_core evdev shpchp container pci_hotplug button processor ext3 jbd 
  mbcache dm_mod sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx 
  scsi_transport_fc tg3 ehci_hcd bnx2 scsi_tgt usbcore nls_base cciss libphy 
  scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
  [ 1258.343332] Pid: 12287, comm: async/20 Tainted: G  D W  
  2.6.32-5-amd64 #1 ProLiant BL460c G1
 [...]
 
 We really need to see the first BUG message after boot.  The 'D' here
 indicates that this is not the first.
 
 Ben.

Hi Ben,

You're right, this was not the first occurence. I was trying to forcibly
reproduce the problem by adding and removing ~1.5k SCSI disks to the system,
which caused a lot of WARNINGS like the following to appear first:

May 12 18:13:27 hn-11 kernel: [  513.803626] [ cut here 
]
May 12 18:13:27 hn-11 kernel: [  513.803636] WARNING: at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/sysfs.h:139
 __sysfs_get+0x20/0x28()
May 12 18:13:27 hn-11 kernel: [  513.803639] Hardware name: ProLiant BL460c G1
May 12 18:13:27 hn-11 kernel: [  513.803641] Modules linked in: kvm_intel kvm 
nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_pkttype 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables 
x_tables 8021q garp bridge stp bonding dm_round_robin dm_multipath scsi_dh 
ipmi_poweroff ipmi_devintf radeon ttm snd_pcm drm_kms_helper snd_timer drm 
i2c_algo_bit snd soundcore snd_page_alloc i2c_core hpwdt i5k_amb ipmi_si 
i5000_edac pcspkr rng_core ipmi_msghandler hpilo psmouse edac_core serio_raw 
evdev shpchp container pci_hotplug button processor ext3 jbd mbcache dm_mod 
sd_mod crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc cciss ehci_hcd 
tg3 libphy usbcore scsi_tgt bnx2 nls_base scsi_mod thermal thermal_sys [last 
unloaded: scsi_wait_scan]
May 12 18:13:27 hn-11 kernel: [  513.803725] Pid: 24914, comm: async/17 Not 
tainted 2.6.32-5-amd64 #1
May 12 18:13:27 hn-11 kernel: [  513.803727] Call Trace:
May 12 18:13:27 hn-11 kernel: [  513.803733]  [8113efad] ? 
__sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803736]  [8113efad] ? 
__sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803740]  [8104db34] ? 
warn_slowpath_common+0x77/0xa3
May 12 18:13:27 hn-11 kernel: [  513.803744]  [8113efad] ? 
__sysfs_get+0x20/0x28
May 12 18:13:27 hn-11 kernel: [  513.803747]  [8113f0f1] ? 
__sysfs_add_one+0x2b/0x84
May 12 18:13:27 hn-11 kernel: [  513.803750]  [8113f1a0] ? 
sysfs_add_one+0x19/0xe4
May 12 18:13:27 hn-11 kernel: [  513.803754]  [8113ec39] ? 
sysfs_add_file_mode+0x4e/0x7f
May 12 18:13:27 hn-11 kernel: [  513.803759]  [811761aa] ? 
elv_register_queue+0x4f/0x6f
May 12 18:13:27 hn-11 kernel: [  513.803764]  [8118024b] ? 
blk_register_queue+0x7f/0xcc
May 12 18:13:27 hn-11 kernel: [  513.803768]  [81184021] ? 
add_disk+0xb8/0x108
May 12 18:13:27 hn-11 kernel: [  513.803776]  [a01566be] ? 
sd_probe_async+0x119/0x1d8 [sd_mod]
May 12 18:13:27 hn-11 kernel: [  513.803781]  [810698a7] ? 
async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803784]  [810699a6] ? 
async_thread+0xff/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803789]  [81049fee] ? 
default_wake_function+0x0/0x9
May 12 18:13:27 hn-11 kernel: [  513.803792]  [810698a7] ? 
async_thread+0x0/0x20d
May 12 18:13:27 hn-11 kernel: [  513.803795]  [81064721] ? 
kthread+0x79/0x81
May 12 18:13:27 hn-11 kernel: [  513.803800]  [81011baa] ? 
child_rip+0xa/0x20
May 12 18:13:27 hn-11 kernel: [  513.803803]  [810646a8] ? 
kthread+0x0/0x81
May 12 18:13:27 hn-11 kernel: [  

Bug#626593: linux-image-2.6.32-5-amd64: BUG during disk hot-plugging when setting the elevator via udev

2011-05-13 Thread Apollon Oikonomopoulos
Package: linux-2.6
Version: 2.6.32-31
Severity: normal

Hi,

We are experiencing the following problem on a number of machines using
2.6.32-5-amd64_2.6.32-31. The machines are used for virtual machine hosting and
have a number of LUNs exported from an FC-connected SAN connected to them via a
multipath topology. Our regular workflow involves hot-removing and hot-adding
disks according to the VMs hosted.

For the LUNs exported by the SAN storage, we have the following udev rule in 
place:

-8-
# Set all netapp LUN schedulers to noop

# Skip partitions
KERNEL==*[0-9], GOTO=lunsched_end

# Set scheduler
ACTION==add, SUBSYSTEM==block, ATTRS{vendor}==NETAPP, 
ATTRS{model}==LUN,ATTR{queue/scheduler}=noop

LABEL=lunsched_end
-8-

Having upgraded from lenny to squeeze last week, we encountered the following
crash during a SCSI bus rescan that added new disks to a system:

[ 1258.343275] [ cut here ]
[ 1258.343280] sd 0:0:0:226: [sdgv] Write cache: disabled, read cache: enabled, 
doesn't support DPO or FUA
[ 1258.343287] kernel BUG at 
/tmp/buildd/linux-2.6-2.6.32/debian/build/source_amd64_none/fs/sysfs/file.c:539!
[ 1258.343289] invalid opcode:  [#2] SMP
[ 1258.343292] last sysfs file: 
/sys/devices/pci:00/:00:05.0/:10:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:216/block/sdgn/removable
[ 1258.343295] CPU 4
[ 1258.343296] Modules linked in: kvm_intel kvm nf_conntrack_ipv6 
ip6table_filter ip6_tables xt_tcpudp xt_pkttype nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables 8021q 
garp bridge stp bonding dm_round_robin dm_multipath scsi_dh ipmi_poweroff 
ipmi_devintf radeon ttm drm_kms_helper snd_pcm ipmi_si drm ipmi_msghandler 
i2c_algo_bit i5k_amb i2c_core snd_timer psmouse i5000_edac snd soundcore 
snd_page_alloc hpwdt hpilo serio_raw edac_core pcspkr rng_core evdev shpchp 
container pci_hotplug button processor ext3 jbd mbcache dm_mod sd_mod 
crc_t10dif usbhid hid uhci_hcd qla2xxx scsi_transport_fc tg3 ehci_hcd bnx2 
scsi_tgt usbcore nls_base cciss libphy scsi_mod thermal thermal_sys [last 
unloaded: scsi_wait_scan]
[ 1258.343332] Pid: 12287, comm: async/20 Tainted: G  D W  2.6.32-5-amd64 
#1 ProLiant BL460c G1
[ 1258.343335] RIP: 0010:[8113ecff]  [8113ecff] 
sysfs_create_file+0x13/0x21
[ 1258.343340] RSP: 0018:8803ffd3fdd8  EFLAGS: 00010246
[ 1258.343342] RAX:  RBX: 81485598 RCX: 589a
[ 1258.343344] RDX: 81476c38 RSI: 81485598 RDI: 
[ 1258.343347] RBP: 88041afaba90 R08:  R09: 813ad975
[ 1258.343349] R10: fff4 R11: 000186a0 R12: 
[ 1258.343351] R13:  R14: 8804264d4458 R15: 
[ 1258.343354] FS:  () GS:88000fd0() 
knlGS:
[ 1258.343356] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[ 1258.343359] CR2: 7f541f42 CR3: 0003fddee000 CR4: 000426e0
[ 1258.343361] DR0:  DR1:  DR2: 
[ 1258.343363] DR3:  DR6: 0ff0 DR7: 0400
[ 1258.343366] Process async/20 (pid: 12287, threadinfo 8803ffd3e000, task 
8804273f1530)
[ 1258.343368] Stack:
[ 1258.343369]  811761aa  8804078c6418 
8804078c60f0
[ 1258.343372] 0 8118024b 8804264d4400 8804264d4400 
8804083ba800
[ 1258.343375] 0 8804083ba928 0001 81184021 
8804264d4400
[ 1258.343378] Call Trace:
[ 1258.343381]  [811761aa] ? elv_register_queue+0x4f/0x6f
[ 1258.343385]  [8118024b] ? blk_register_queue+0x7f/0xcc
[ 1258.343388]  [81184021] ? add_disk+0xb8/0x108
[ 1258.343393]  [a01506be] ? sd_probe_async+0x119/0x1d8 [sd_mod]
[ 1258.343396]  [810698a7] ? async_thread+0x0/0x20d
[ 1258.343399]  [810699a6] ? async_thread+0xff/0x20d
[ 1258.343403]  [81049fee] ? default_wake_function+0x0/0x9
[ 1258.343406]  [810698a7] ? async_thread+0x0/0x20d
[ 1258.343408]  [81064721] ? kthread+0x79/0x81
[ 1258.343411]  [81011baa] ? child_rip+0xa/0x20
[ 1258.343414]  [810646a8] ? kthread+0x0/0x81
[ 1258.343416]  [81011ba0] ? child_rip+0x0/0x20   
  
[ 1258.343418] Code: 74 0f 48 89 ef e8 24 07 00 00 eb 05 bb fe ff ff ff 89 d8 
5b 5d 41 5c c3 48 85 ff 74 0e 48 8b 7f 30 48 85 ff 74 05 48 85 f6 75 04 0f 0b 
eb fe ba 02 00 00 00 e9 5d ff ff ff 55 53 48 89 fb 48 c7
  
[ 1258.343437] RIP  [8113ecff] sysfs_create_file+0x13/0x21
[ 1258.343440]  RSP 8803ffd3fdd8
[ 1258.343443] ---[ end trace eeb541477f3e4233 ]---

Apart from this BUG, there are also sporadic warnings, like:
[  324.454044] kobject_add_internal failed for hÝ$iosched with -EEXIST, 
don't try to register things with the