Some questions about scsi_eh_wakeup

zhengbin (A) Sat, 23 Mar 2019 04:30:36 -0700

Hi,

When I use fio test kernel in the following steps:
1.The sas controller mixes SAS/SATA disks
2.Use fio test all disks
3.Simultaneous enable/disable/link_reset/hard_reset PHY


it will hung in ata_port_wait_eh
Call trace:
 __switch_to+0xb4/0x1b8
 __schedule+0x1e8/0x718
 schedule+0x38/0x90
 ata_port_wait_eh+0x70/0xf8
 sas_ata_wait_eh+0x24/0x30 [libsas]
 transport_sas_phy_reset.isra.3+0x128/0x160 [libsas]
 phy_reset_work+0x20/0x30 [libsas]
 process_one_work+0x1e4/0x460
 worker_thread+0x40/0x450
 kthread+0x12c/0x130
 ret_from_fork+0x10/0x18

I think the reason is as follows(neither function wakes up the SCSI error 
handler):
A thread：
scsi_schedule_eh
        spin_lock_irqsave(shost->host_lock, flags);
        if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
            scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
            shost->host_eh_scheduled++;
            scsi_eh_wakeup(shost);  --->visit host_busy && host_failed
        }
        spin_unlock_irqrestore(shost->host_lock, flags);

B thread：
scsi_dec_host_busy
        rcu_read_lock();
        atomic_dec(&shost->host_busy);  --->host_busy and shost_state not in 
spinlock, maybe we need to put these in spinlock?
        if (unlikely(scsi_host_in_recovery(shost))) {
            spin_lock_irqsave(shost->host_lock, flags);
            if (shost->host_failed || shost->host_eh_scheduled)
                scsi_eh_wakeup(shost);
                spin_unlock_irqrestore(shost->host_lock, flags);
        }
        rcu_read_unlock();


PS: commit 3bd6f43f5(use rcu_read_lock) fix the hung issue  if 
scsi_eh_scmd_add() is called concurrently with
  scsi_host_queue_ready() while shost->host_blocked > 0

Thanks,
zhengbin

Some questions about scsi_eh_wakeup

Reply via email to