Re: INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected
On Fri, Aug 10, 2012 at 2:58 AM, Michael Christie micha...@cs.wisc.edu wrote: On Aug 8, 2012, at 4:42 AM, Fubo Chen fubo.c...@gmail.com wrote: Anyone seen this before ? Also occurs with 3.4.1. == [ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ] 3.6.0-rc1-debug+ #1 Not tainted -- swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire: ((session-lock)-rlock){+.-...}, at: [a025dc08] iscsi_eh_cmd_timed_out+0x58/0x2e0 [libiscsi] and this task is already holding: ((q-__queue_lock)-rlock){-.-...}, at: [811f6965] blk_rq_timed_out_timer+0x25/0x140 which would create a new lock dependency: ((q-__queue_lock)-rlock){-.-...} - ((session-lock)-rlock){+.-...} but this new dependency connects a HARDIRQ-irq-safe lock: ((q-__queue_lock)-rlock){-.-...} ... which became HARDIRQ-irq-safe at: [8109b5ca] __lock_acquire+0x7ea/0x1ba0 [8109cfc2] lock_acquire+0x92/0x140 [814b41c5] _raw_spin_lock_irqsave+0x65/0xb0 [812e2974] blk_done+0x34/0x110 [81295889] vring_interrupt+0x49/0xc0 [810c68f5] handle_irq_event_percpu+0x75/0x270 [810c6b38] handle_irq_event+0x48/0x70 [810c9477] handle_edge_irq+0x77/0x110 [81004042] handle_irq+0x22/0x40 [814bda2a] do_IRQ+0x5a/0xe0 [814b436f] ret_from_intr+0x0/0x1a [8100a7da] default_idle+0x4a/0x170 [8100b609] cpu_idle+0xe9/0x130 [814a4c6e] start_secondary+0x26a/0x26c Does this error only occur when using some sort of virt setup? I do not think we will hit this with iscsi, because we do not ever grab the queue lock for a iscsi device from hard irq context. It is always done from softirq or thread context. The snippet above seems to be from the virtio_blk.c code. Yes. This happened inside KVM machine. Fubo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected
Anyone seen this before ? Also occurs with 3.4.1. == [ INFO: HARDIRQ-safe - HARDIRQ-unsafe lock order detected ] 3.6.0-rc1-debug+ #1 Not tainted -- swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire: ((session-lock)-rlock){+.-...}, at: [a025dc08] iscsi_eh_cmd_timed_out+0x58/0x2e0 [libiscsi] and this task is already holding: ((q-__queue_lock)-rlock){-.-...}, at: [811f6965] blk_rq_timed_out_timer+0x25/0x140 which would create a new lock dependency: ((q-__queue_lock)-rlock){-.-...} - ((session-lock)-rlock){+.-...} but this new dependency connects a HARDIRQ-irq-safe lock: ((q-__queue_lock)-rlock){-.-...} ... which became HARDIRQ-irq-safe at: [8109b5ca] __lock_acquire+0x7ea/0x1ba0 [8109cfc2] lock_acquire+0x92/0x140 [814b41c5] _raw_spin_lock_irqsave+0x65/0xb0 [812e2974] blk_done+0x34/0x110 [81295889] vring_interrupt+0x49/0xc0 [810c68f5] handle_irq_event_percpu+0x75/0x270 [810c6b38] handle_irq_event+0x48/0x70 [810c9477] handle_edge_irq+0x77/0x110 [81004042] handle_irq+0x22/0x40 [814bda2a] do_IRQ+0x5a/0xe0 [814b436f] ret_from_intr+0x0/0x1a [8100a7da] default_idle+0x4a/0x170 [8100b609] cpu_idle+0xe9/0x130 [814a4c6e] start_secondary+0x26a/0x26c to a HARDIRQ-irq-unsafe lock: ((session-lock)-rlock){+.-...} ... which became HARDIRQ-irq-unsafe at: ... [8109b3e5] __lock_acquire+0x605/0x1ba0 [8109cfc2] lock_acquire+0x92/0x140 [814b361b] _raw_spin_lock_bh+0x4b/0x80 [a025b404] iscsi_conn_setup+0x154/0x210 [libiscsi] [a02322b4] iscsi_tcp_conn_setup+0x14/0x40 [libiscsi_tcp] [a026a0e9] iscsi_sw_tcp_conn_create+0x29/0x100 [iscsi_tcp] [a024de58] iscsi_if_rx+0xa48/0xf60 [scsi_transport_iscsi] [813e68ed] netlink_unicast+0x1ad/0x230 [813e6c0b] netlink_sendmsg+0x29b/0x2f0 [813ad04f] sock_sendmsg+0x9f/0xe0 [813ae7ff] __sys_sendmsg+0x2df/0x2f0 [813af979] sys_sendmsg+0x49/0x90 [814bc6e9] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0CPU1 lock((session-lock)-rlock); local_irq_disable(); lock((q-__queue_lock)-rlock); lock((session-lock)-rlock); Interrupt lock((q-__queue_lock)-rlock); *** DEADLOCK *** 2 locks held by swapper/1/0: #0: (q-timeout){+.-...}, at: [8104dc7f] run_timer_softirq+0x12f/0x4c0 #1: ((q-__queue_lock)-rlock){-.-...}, at: [811f6965] blk_rq_timed_out_timer+0x25/0x140 the dependencies between HARDIRQ-irq-safe lock and the holding lock: - ((q-__queue_lock)-rlock){-.-...} ops: 539229 { IN-HARDIRQ-W at: [8109b5ca] __lock_acquire+0x7ea/0x1ba0 [8109cfc2] lock_acquire+0x92/0x140 [814b41c5] _raw_spin_lock_irqsave+0x65/0xb0 [812e2974] blk_done+0x34/0x110 [81295889] vring_interrupt+0x49/0xc0 [810c68f5] handle_irq_event_percpu+0x75/0x270 [810c6b38] handle_irq_event+0x48/0x70 [810c9477] handle_edge_irq+0x77/0x110 [81004042] handle_irq+0x22/0x40 [814bda2a] do_IRQ+0x5a/0xe0 [814b436f] ret_from_intr+0x0/0x1a [8100a7da] default_idle+0x4a/0x170 [8100b609] cpu_idle+0xe9/0x130 [814a4c6e] start_secondary+0x26a/0x26c IN-SOFTIRQ-W at: [8109b3b8] __lock_acquire+0x5d8/0x1ba0 [8109cfc2] lock_acquire+0x92/0x140 [814b41c5] _raw_spin_lock_irqsave+0x65/0xb0 [8120695e] cfq_idle_slice_timer+0x2e/0x110 [8104dcf8] run_timer_softirq+0x1a8/0x4c0 [81045f38] __do_softirq+0xd8/0x290 [814bd9bc] call_softirq+0x1c/0x26 [81004105] do_softirq+0xa5/0xe0 [8104642e] irq_exit+0xae/0xe0 [814bdb1e] smp_apic_timer_interrupt+0x6e/0x99 [814bd22f] apic_timer_interrupt+0x6f/0x80 [81296d34] vp_try_to_find_vqs+0x6e4/0x7b0 [81296fb2] vp_find_vqs+0x42/0xc0 [81325c73] init_vqs+0x83/0x110 [81326112] virtnet_probe+0x362/0x510 [812951a3] virtio_dev_probe+0xe3/0x160
Re: [PATCH 1/4] BNX2I: Added the use of kthreads to handle SCSI cmd completion
On Tue, Jun 21, 2011 at 6:49 PM, Eddie Wai eddie@broadcom.com wrote: +/** + * bnx2i_percpu_io_thread - thread per cpu for ios + * + * @arg: ptr to bnx2i_percpu_info structure + */ +int bnx2i_percpu_io_thread(void *arg) +{ + struct bnx2i_percpu_s *p = arg; + struct bnx2i_work *work, *tmp; + LIST_HEAD(work_list); + + set_user_nice(current, -20); + + set_current_state(TASK_INTERRUPTIBLE); + while (!kthread_should_stop()) { + schedule(); + spin_lock_bh(p-p_work_lock); + while (!list_empty(p-work_list)) { + list_splice_init(p-work_list, work_list); + spin_unlock_bh(p-p_work_lock); + + list_for_each_entry_safe(work, tmp, work_list, list) { + list_del_init(work-list); + /* work allocated in the bh, freed here */ + bnx2i_process_scsi_cmd_resp(work-session, + work-bnx2i_conn, + work-cqe); + atomic_dec(work-bnx2i_conn-work_cnt); + kfree(work); + } + spin_lock_bh(p-p_work_lock); + } + set_current_state(TASK_INTERRUPTIBLE); + spin_unlock_bh(p-p_work_lock); + } + __set_current_state(TASK_RUNNING); + + return 0; +} This loop looks a little strange to me. If the schedule() call would be moved from the top of the outermost while loop to the bottom then the first set_current_state(TASK_INTERRUPTIBLE) statement can be eliminated. And that also fixes the (theoretical?) race that occurs if wake_up_process() gets invoked after kthread_create() but before the first set_current_state(TASK_INTERRUPTIBLE) statement got executed. Fubo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: MD-RAID1 and iSCSI with multipathd: some experience
On Oct 14 2010, 1:45 pm, Ulrich Windl ulrich.wi...@rz.uni- regensburg.de wrote: I was investigating the status of building a RAID1 over iSCSI-connected devices managed by multipathd (SLES10 SP3 Release Notes said it won't work). Here are some of my findings: 1) The multipath-devices cannot be opened exclusively my mdadm: # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 --bitmap=internal /dev/disk/by-id/scsi-3600508b4001085dd00011226 /dev/disk/by-id/scsi-3600508b4001085dd00011229 mdadm: Cannot open /dev/disk/by-id/scsi-3600508b4001085dd00011226: Device or resource busy mdadm: Cannot open /dev/disk/by-id/scsi-3600508b4001085dd00011229: Device or resource busy mdadm: create aborted open(/dev/disk/by-id/scsi-3600508b4001085dd00011226, O_RDONLY|O_EXCL) = -1 EBUSY (Device or resource busy) 2) The device-mapper files seem to be no SCSI Devices: # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 --bitmap=internal /dev/dm-18 /dev/dm-19 mdadm: /dev/dm-18 is too small: 0K mdadm: create aborted rkdvmso1:~ # sdparm -a /dev/dm-18 unable to access /dev/dm-18, ATA disk? 3) The iSCSI devices are SCSI-devices, but are busy: # sdparm -a /dev/sdax /dev/sdax: HP HSV200 5000 Read write error recovery mode page: AWRE 1 [cha: n, def: 1] ARRE 1 [cha: n, def: 1] TB 1 [cha: n, def: 1] RC 0 [cha: n, def: 0] [...] # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 --bitmap=internal /dev/sdax /dev/sdbo mdadm: Cannot open /dev/sdax: Device or resource busy mdadm: Cannot open /dev/sdbo: Device or resource busy mdadm: create aborted I'm not a specialist on mdadm, so please if I did something wrong, please tell me. Hi, I have been looking at related but not identical question: to replicate local disk to another server via iSCSI and md mirroring (RAID1, no multipath). While making that setup I noticed that open- iscsi times out SCSI commands if the network falls away long enough. Why does open-iscsi initiator make SCSI commands fail instead of reporting disk removal ? $ sg_inq /dev/disk/by-path/ip-192.168.3.114\:3260-iscsi-...:tgt-lun-0 | grep RMB PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] Fubo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: MD-RAID1 and iSCSI with multipathd: some experience
On October 14, Ulrich Windl wrote: I was investigating the status of building a RAID1 over iSCSI- connected devices managed by multipathd (SLES10 SP3 Release Notes said it won't work). Here are some of my findings: 1) The multipath-devices cannot be opened exclusively my mdadm: # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 -- bitmap=internal /dev/disk/by-id/ scsi-3600508b4001085dd00011226 /dev/disk/by-id/ scsi-3600508b4001085dd00011229 mdadm: Cannot open /dev/disk/by-id/ scsi-3600508b4001085dd00011226: Device or resource busy mdadm: Cannot open /dev/disk/by-id/ scsi-3600508b4001085dd00011229: Device or resource busy mdadm: create aborted open(/dev/disk/by-id/scsi-3600508b4001085dd00011226, O_RDONLY|O_EXCL) = -1 EBUSY (Device or resource busy) 2) The device-mapper files seem to be no SCSI Devices: # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 -- bitmap=internal /dev/dm-18 /dev/dm-19 mdadm: /dev/dm-18 is too small: 0K mdadm: create aborted rkdvmso1:~ # sdparm -a /dev/dm-18 unable to access /dev/dm-18, ATA disk? 3) The iSCSI devices are SCSI-devices, but are busy: # sdparm -a /dev/sdax /dev/sdax: HPHSV2005000 Read write error recovery mode page: AWRE1 [cha: n, def: 1] ARRE1 [cha: n, def: 1] TB 1 [cha: n, def: 1] RC 0 [cha: n, def: 0] [...] # mdadm --verbose --create /dev/md0 --raid-devices=2 --level=raid1 -- bitmap=internal /dev/sdax /dev/sdbo mdadm: Cannot open /dev/sdax: Device or resource busy mdadm: Cannot open /dev/sdbo: Device or resource busy mdadm: create aborted I'm not a specialist on mdadm, so please if I did something wrong, please tell me. Hi, I have been looking at related but not identical problem. I'm trying to use md to replicate local disk to remote server by iSCSI and mirroring (RAID1). But I noticed that iSCSI commands fail if network timeout occurs longer than the iSCSI command timeout. I noticed that the block device created by open-iscsi is marked as non-removable (RMB=0). Why does open-iscsi behave this way and why does it not report disk removal event if network connection fails ? # mdadm --query --detail /dev/md4 | tail -n 3 Number Major Minor RaidDevice State 0 8 320 active sync /dev/sdc 1 8 641 active sync /dev/sde # sg_inq /dev/disk/by-path/ip-192.168.3.114\:3260-iscsi-iqn\:tgt-lun-0 | grep RMB PQual=0 Device_type=0 RMB=0 version=0x05 [SPC-3] Fubo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.