[Bug 1971193] Re: Server Crash while running IO and switch port bounce test with 2K login session

2022-05-09 Thread Laurie Barry
Driver team has highlighted this patch is required to address this
issue:

author  James Smart   2022-04-12 15:19:44 -0700
committer   Martin K. Petersen  2022-04-18 
22:48:43 -0400
commit  e294647b1aed4247fe52851f3a3b2b19ae906228 (patch)
treefd7e11a3c6f680d5aabd468d523d08ffcd66b59f /drivers/scsi/lpfc
parent  b83a8c21f3fe874e12eb2b6e6c5cfb220d35c446 (diff)
downloadscsi-e294647b1aed4247fe52851f3a3b2b19ae906228.tar.gz
scsi: lpfc: Move cfg_log_verbose check before calling lpfc_dmp_dbg()
In an attempt to log message 0126 with LOG_TRACE_EVENT, the following hard
lockup call trace hangs the system.

Call Trace:
 _raw_spin_lock_irqsave+0x32/0x40
 lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
 lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
 lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
 lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
 lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
 lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
 lpfc_do_work+0x1485/0x1d70 [lpfc]
 kthread+0x112/0x130
 ret_from_fork+0x1f/0x40
Kernel panic - not syncing: Hard LOCKUP

The same CPU tries to claim the phba->port_list_lock twice.

Move the cfg_log_verbose checks as part of the lpfc_printf_vlog() and
lpfc_printf_log() macros before calling lpfc_dmp_dbg().  There is no need
to take the phba->port_list_lock within lpfc_dmp_dbg().

Link: https://lore.kernel.org/r/2022041008.126521-3-jsmart2...@gmail.com
Co-developed-by: Justin Tee 
Signed-off-by: Justin Tee 
Signed-off-by: James Smart 
Signed-off-by: Martin K. Petersen 
Diffstat (limited to 'drivers/scsi/lpfc')
-rw-r--r--  drivers/scsi/lpfc/lpfc_init.c   29  
-rw-r--r--  drivers/scsi/lpfc/lpfc_logmsg.h 6   
2 files changed, 4 insertions, 31 deletions
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 461d333b1b3a8..f9cd4b72d949a 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -15700,34 +15700,7 @@ void lpfc_dmp_dbg(struct lpfc_hba *phba)
unsigned int temp_idx;
int i;
int j = 0;
-   unsigned long rem_nsec, iflags;
-   bool log_verbose = false;
-   struct lpfc_vport *port_iterator;
-
-   /* Don't dump messages if we explicitly set log_verbose for the
-* physical port or any vport.
-*/
-   if (phba->cfg_log_verbose)
-   return;
-
-   spin_lock_irqsave(&phba->port_list_lock, iflags);
-   list_for_each_entry(port_iterator, &phba->port_list, listentry) {
-   if (port_iterator->load_flag & FC_UNLOADING)
-   continue;
-   if (scsi_host_get(lpfc_shost_from_vport(port_iterator))) {
-   if (port_iterator->cfg_log_verbose)
-   log_verbose = true;
-
-   scsi_host_put(lpfc_shost_from_vport(port_iterator));
-
-   if (log_verbose) {
-   spin_unlock_irqrestore(&phba->port_list_lock,
-  iflags);
-   return;
-   }
-   }
-   }
-   spin_unlock_irqrestore(&phba->port_list_lock, iflags);
+   unsigned long rem_nsec;
 
if (atomic_cmpxchg(&phba->dbg_log_dmping, 0, 1) != 0)
return;
diff --git a/drivers/scsi/lpfc/lpfc_logmsg.h b/drivers/scsi/lpfc/lpfc_logmsg.h
index 7d480c7987942..a5aafe230c74f 100644
--- a/drivers/scsi/lpfc/lpfc_logmsg.h
+++ b/drivers/scsi/lpfc/lpfc_logmsg.h
@@ -73,7 +73,7 @@ do { \
 #define lpfc_printf_vlog(vport, level, mask, fmt, arg...) \
 do { \
{ if (((mask) & (vport)->cfg_log_verbose) || (level[1] <= '3')) { \
-   if ((mask) & LOG_TRACE_EVENT) \
+   if ((mask) & LOG_TRACE_EVENT && !(vport)->cfg_log_verbose) \
lpfc_dmp_dbg((vport)->phba); \
dev_printk(level, &((vport)->phba->pcidev)->dev, "%d:(%d):" \
   fmt, (vport)->phba->brd_no, vport->vpi, ##arg);  \
@@ -89,11 +89,11 @@ do { \
 (phba)->pport->cfg_log_verbose : \
 (phba)->cfg_log_verbose; \
if (((mask) & log_verbose) || (level[1] <= '3')) { \
-   if ((mask) & LOG_TRACE_EVENT) \
+   if ((mask) & LOG_TRACE_EVENT && !log_verbose) \
lpfc_dmp_dbg(phba); \
dev_printk(level, &((phba)->pcidev)->dev, "%d:" \
fmt, phba->brd_no, ##arg); \
-   } else  if (!(phba)->cfg_log_verbose)\
+   } else if (!log_verbose)\
lpfc_dbg_print(phba, "%d:" fmt, phba->brd_no, ##arg); \
} \
 } while (0)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971193

Title:
  Server Crash while running IO and switch port bounce test with 2K
  login session

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971193/

[Bug 1971193] Re: Server Crash while running IO and switch port bounce test with 2K login session

2022-05-04 Thread Jeff Lane
Thanks @mfo! That is correct, the crash was seen there, but they
determined it was generic and are pushing this to all the Linux OSVs.


Also of note, that patch set is a general driver update and not all of those 
are relevant to this bug, I've asked them to pinpoint the patches that resolve 
this issue specifically with the intent of just pulling those.


** Description changed:

+ [Impact]
  Server crash and Call trace reported on one of the servers running IO and
  switch port bounce test from the 2K login session configuration.
- 
  
  Call Trace:
  [56048.470488] Call Trace:
  [56048.470489]  _raw_spin_lock_irqsave+0x32/0x40
  [56048.470489]  lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
  [56048.470490]  lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
  [56048.470490]  lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
  [56048.470490]  lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
  [56048.470491]  lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
  [56048.470491]  lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
  [56048.470492]  lpfc_do_work+0x1485/0x1d70 [lpfc]
  [56048.470492]  ? __schedule+0x280/0x700
  [56048.470492]  ? finish_wait+0x80/0x80
  [56048.470493]  ? lpfc_unregister_unused_fcf+0x80/0x80 [lpfc]
  [56048.470493]  kthread+0x112/0x130
  [56048.470493]  ? kthread_flush_work_fn+0x10/0x10
  [56048.470494]  ret_from_fork+0x1f/0x40
  [56048.470494] Kernel panic - not syncing: Hard LOCKUP
- [56048.470495] CPU: 0 PID: 682 Comm: lpfc_worker_0 Kdump: loaded Tainted: G   
 
-  IOE- -  - 4.18.0-240.el8.x86_64 #1
+ [56048.470495] CPU: 0 PID: 682 Comm: lpfc_worker_0 Kdump: loaded Tainted: G
+  IOE- -  - 4.18.0-240.el8.x86_64 #1
  [56048.470496] Hardware name: Dell Inc. PowerEdge R740/0DY2X0, BIOS 2.11.2
  004/21/2021
  [56048.470496] Call Trace:
  [56048.470496]  
  [56048.470496]  dump_stack+0x5c/0x80
  [56048.470497]  panic+0xe7/0x2a9
  [56048.470497]  ? __switch_to_asm+0x51/0x70
  [56048.470497]  nmi_panic.cold.9+0xc/0xc
  [56048.470498]  watchdog_overflow_callback.cold.7+0x5c/0x70
  [56048.470498]  __perf_event_overflow+0x52/0xf0
  [56048.470499]  handle_pmi_common+0x1db/0x270
  [56048.470499]  ? __set_pte_vaddr+0x32/0x50
  [56048.470499]  ? __native_set_fixmap+0x24/0x30
  [56048.470500]  ? ghes_copy_tofrom_phys+0xd3/0x1c0
  [56048.470500]  ? __ghes_peek_estatus.isra.12+0x49/0xa0
  [56048.470500]  intel_pmu_handle_irq+0xbf/0x160
  [56048.470501]  perf_event_nmi_handler+0x2d/0x50
  [56048.470501]  nmi_handle+0x63/0x110
  [56048.470501]  default_do_nmi+0x4e/0x100
  [56048.470502]  do_nmi+0x128/0x190
  [56048.470502]  end_repeat_nmi+0x16/0x6a
  [56048.470503] RIP: 0010:native_queued_spin_lock_slowpath+0x5d/0x1d0
  [56048.470504] Code: 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4
  09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 
75
  f8 b8 01 00 00 00 66 89 07 c3 8b 37 81 fe 00 01 00 00 75
  [56048.470504] RSP: 0018:acebc7877ca8 EFLAGS: 0002
  [56048.470505] RAX: 0101 RBX: 0246 RCX:
  001f
  [56048.470505] RDX:  RSI:  RDI:
  94dcf5341dc0
  [56048.470506] RBP: 94dcf534 R08: 0002 R09:
  00029600
  [56048.470506] R10: 60d29656a45c R11: 94dcf534fd12 R12:
  94dcf5341db0
  [56048.470507] R13: 94dcf5341dc0 R14: 94dcc4ae8a00 R15:
  0003
  [56048.470507]  ? native_queued_spin_lock_slowpath+0x5d/0x1d0
  [56048.470507]  ? native_queued_spin_lock_slowpath+0x5d/0x1d0
  [56048.470508]  
  [56048.470508]  _raw_spin_lock_irqsave+0x32/0x40
  [56048.470509]  lpfc_dmp_dbg.part.32+0x28/0x220 [lpfc]
  [56048.470509]  lpfc_cmpl_els_fdisc+0x145/0x460 [lpfc]
  [56048.470509]  lpfc_sli_cancel_jobs+0x92/0xd0 [lpfc]
  [56048.470510]  lpfc_els_flush_cmd+0x43c/0x670 [lpfc]
  [56048.470510]  lpfc_els_flush_all_cmd+0x37/0x60 [lpfc]
  [56048.470510]  lpfc_sli4_async_event_proc+0x956/0x1720 [lpfc]
  [56048.470511]  lpfc_do_work+0x1485/0x1d70 [lpfc]
  [56048.470511]  ? __schedule+0x280/0x700
  [56048.470511]  ? finish_wait+0x80/0x80
  [56048.470512]  ? lpfc_unregister_unused_fcf+0x80/0x80 [lpfc]
  [56048.470512]  kthread+0x112/0x130
  [56048.470513]  ? kthread_flush_work_fn+0x10/0x10
  [56048.470513]  ret_from_fork+0x1f/0x40
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021-11-20-05:14:30]#
  
- 
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021-11-20-05:14:30]# cat
  /etc/redhat-release
  Red Hat Enterprise Linux release 8.3 (Ootpa)
  
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021-11-20-05:14:30]# cat
  /sys/module/lpfc/version
  0:14.0.390.2
  
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021-11-20-05:14:30]# cat
  /sys/class/scsi_host/host*/modeldesc
  Emulex LightPulse LPe32002-M2 2-Port 32Gb Fibre Channel Adapter
  Emulex LightPulse LPe32002-M2 2-Port 32Gb Fibre Channel Adapter
  
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021-11-20-05:14:30]# cat
  /sys/class/scsi_host/host*/fwrev
  14.0.390.1, sli-4:2:c
  14.0.390.1, sli-4:2:c
  
  [root@ms-svr3-10-231-131-160 127.0.0.1-2021

[Bug 1971193] Re: Server Crash while running IO and switch port bounce test with 2K login session

2022-05-03 Thread Mauricio Faria de Oliveira
Note for other readers / observation:

The kernel/OS in bug description are RHEL 8, not Ubuntu.
[4.18.0-240.el8.x86_64]

But per the comment/link at the end [1] it seems this bug
will be used for a lpfc driver update (fixing that error).

"""
Patches pushed upstream 4/12/22:

https://lore.kernel.org/linux-scsi/2022041008.126521-1-jsmart2...@gmail.com/T/#t
"""

[PATCH 00/26] lpfc: Update lpfc to revision 14.2.0.2]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971193

Title:
  Server Crash while running IO and switch port bounce test with 2K
  login session

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971193/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1971193] Re: Server Crash while running IO and switch port bounce test with 2K login session

2022-05-03 Thread Jeff Lane
** Tags added: servcert-345

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1971193

Title:
  Server Crash while running IO and switch port bounce test with 2K
  login session

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1971193/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs