[Kernel-packages] [Bug 1887774] Comment bridged from LTC Bugzilla

2021-01-22 Thread bugproxy
--- Comment From heinz-werner_se...@de.ibm.com 2021-01-22 05:55 EDT---
IBM Bugzilla status->closed, Fix Released for all requested distros

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1887774

Title:
  [UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed
  ERP

Status in Ubuntu on IBM z Systems:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Focal:
  Fix Released
Status in linux source package in Groovy:
  Fix Released

Bug description:
  SRU Justification:
  ==

  [Impact]

  * Linux kernel panics due to kernel page fault in IRQ context when
  running zfcp_erp_timeout_handler() calling zfcp_erp_notify().

  [Fix]

  * 936e6b85da0476dd2edac7c51c68072da9fb4ba2 936e6b85da04 "scsi: zfcp:
  Fix panic on ERP timeout for previously dismissed ERP action"

  [Test Case]

  * Requires an IBM z13/z13s or LinuxONE Rockhopper/Emperor system (or
  newer) connected to zfcp capcble storage sub-system.

  * Initiate an (ERP) timeout (maybe by injection or by causing a slow
  recovery otherwise).

  * Monitor the system log for any kernel panics.

  [Regression Potential]

  * The regression can be considered as medium since the modification is
  platform specific / limited to s390x and again limited to the zfcp
  layer.

  * Within zfcp it's further limited to the error recovery procedure
  (ERP) of fcp and only touches zfcp_erp.c, means the code path is
  mainly active under error conditions.

  [Other]

  * The above fix is upstream accepted with v5.8-rc3, hence will make
  it's way to groovy with kernel 5.8.

  * Therefore this SRU request was submitted for bionic and focal only
  and not for groovy.

  __

  Description:   zfcp: Fix panic on ERP timeout for previously dismissed ERP
  Symptom:   Linux kernel panic due to kernel page fault in IRQ context
     when running zfcp_erp_timeout_handler() calling
     zfcp_erp_notify().
  Problem:   Suppose that, for unrelated reasons, FSF requests on behalf
     of recovery are very slow and can run into the ERP timeout.
     In the case at hand, we did adapter recovery to a large
     degree. However due to the slowness a LUN open is pending so
     the corresponding fc_rport remains blocked. After
     fast_io_fail_tmo we trigger close physical port recovery for
     the port under which the LUN should have been opened. The
     new higher order port recovery dismisses the pending LUN
     open ERP action and dismisses the pending LUN open FSF
     request. Such dismissal decouples the ERP action from the
     pending corresponding FSF request by setting
     zfcp_fsf_req->erp_action to NULL (among other things)
     [zfcp_erp_strategy_check_fsfreq()].
     If now the ERP timeout for the pending open LUN request runs
     out, we must not use zfcp_fsf_req->erp_action in the ERP
     timeout handler. This is a problem since v4.15 commit
     75492a51568b ("s390/scsi: Convert timers to use
     timer_setup()"). Before that we intentionally only passed
     zfcp_erp_action as context argument to
     zfcp_erp_timeout_handler().
     Note: The lifetime of the corresponding zfcp_fsf_req object
     continues until a (late) response or an (unrelated) adapter
     recovery.
  Solution:  Just like the regular response path ignores dismissed
     requests [zfcp_fsf_req_complete() =>
     zfcp_fsf_protstatus_eval() => return early] the ERP timeout
     handler now needs to ignore dismissed requests. So simply
     return early in the ERP timeout handler if the FSF request
     is marked as dismissed in its status flags. To protect
     against the race where zfcp_erp_strategy_check_fsfreq()
     dismisses and sets zfcp_fsf_req->erp_action to NULL after
     our previous status flag check, return early if
     zfcp_fsf_req->erp_action is NULL. After all, the former ERP
     action does not need to be woken up as that was already done
     as part of the dismissal above [zfcp_erp_action_dismiss()].

  Upstream-ID:   936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8

  Will be integrated by kernel 5.8 by groovy.

  Please check that this also be integrated into 20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887774/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net

[Kernel-packages] [Bug 1887774] Comment bridged from LTC Bugzilla

2020-07-24 Thread bugproxy
--- Comment From ma...@de.ibm.com 2020-07-24 12:54 EDT---
zfcp regression tested the private build from 
https://people.canonical.com/~fheimes/lp1887774/.

(The private build seems to have the same kernelrelease as the latest official 
update kernel. I removed the latter after going to the previous backlevel 
official update kernel (5.4.0-40-generic) and before installing the private 
build. I hope I did run the correct private build:
Linux hostname 5.4.0-42-generic #46 SMP Thu Jul 16 12:06:43 UTC 2020 s390x 
s390x s390x GNU/Linux)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1887774

Title:
  [UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed
  ERP

Status in Ubuntu on IBM z Systems:
  In Progress
Status in linux package in Ubuntu:
  New
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Groovy:
  New

Bug description:
  SRU Justification:
  ==

  [Impact]

  * Linux kernel panics due to kernel page fault in IRQ context when
  running zfcp_erp_timeout_handler() calling zfcp_erp_notify().

  [Fix]

  * 936e6b85da0476dd2edac7c51c68072da9fb4ba2 936e6b85da04 "scsi: zfcp:
  Fix panic on ERP timeout for previously dismissed ERP action"

  [Test Case]

  * Requires an IBM z13/z13s or LinuxONE Rockhopper/Emperor system (or
  newer) connected to zfcp capcble storage sub-system.

  * Initiate an (ERP) timeout (maybe by injection or by causing a slow
  recovery otherwise).

  * Monitor the system log for any kernel panics.

  [Regression Potential]

  * The regression can be considered as medium since the modification is
  platform specific / limited to s390x and again limited to the zfcp
  layer.

  * Within zfcp it's further limited to the error recovery procedure
  (ERP) of fcp and only touches zfcp_erp.c, means the code path is
  mainly active under error conditions.

  [Other]

  * The above fix is upstream accepted with v5.8-rc3, hence will make
  it's way to groovy with kernel 5.8.

  * Therefore this SRU request was submitted for bionic and focal only
  and not for groovy.

  __

  Description:   zfcp: Fix panic on ERP timeout for previously dismissed ERP
  Symptom:   Linux kernel panic due to kernel page fault in IRQ context
     when running zfcp_erp_timeout_handler() calling
     zfcp_erp_notify().
  Problem:   Suppose that, for unrelated reasons, FSF requests on behalf
     of recovery are very slow and can run into the ERP timeout.
     In the case at hand, we did adapter recovery to a large
     degree. However due to the slowness a LUN open is pending so
     the corresponding fc_rport remains blocked. After
     fast_io_fail_tmo we trigger close physical port recovery for
     the port under which the LUN should have been opened. The
     new higher order port recovery dismisses the pending LUN
     open ERP action and dismisses the pending LUN open FSF
     request. Such dismissal decouples the ERP action from the
     pending corresponding FSF request by setting
     zfcp_fsf_req->erp_action to NULL (among other things)
     [zfcp_erp_strategy_check_fsfreq()].
     If now the ERP timeout for the pending open LUN request runs
     out, we must not use zfcp_fsf_req->erp_action in the ERP
     timeout handler. This is a problem since v4.15 commit
     75492a51568b ("s390/scsi: Convert timers to use
     timer_setup()"). Before that we intentionally only passed
     zfcp_erp_action as context argument to
     zfcp_erp_timeout_handler().
     Note: The lifetime of the corresponding zfcp_fsf_req object
     continues until a (late) response or an (unrelated) adapter
     recovery.
  Solution:  Just like the regular response path ignores dismissed
     requests [zfcp_fsf_req_complete() =>
     zfcp_fsf_protstatus_eval() => return early] the ERP timeout
     handler now needs to ignore dismissed requests. So simply
     return early in the ERP timeout handler if the FSF request
     is marked as dismissed in its status flags. To protect
     against the race where zfcp_erp_strategy_check_fsfreq()
     dismisses and sets zfcp_fsf_req->erp_action to NULL after
     our previous status flag check, return early if
     zfcp_fsf_req->erp_action is NULL. After all, the former ERP
     action does not need to be woken up as that was already done
     as part of the dismissal above [zfcp_erp_action_dismiss()].

  Upstream-ID:   

[Kernel-packages] [Bug 1887774] Comment bridged from LTC Bugzilla

2020-07-16 Thread bugproxy
--- Comment From heinz-werner_se...@de.ibm.com 2020-07-16 09:33 EDT---
Due to the fact, that this problem comes up with kernel 4.15, also integration 
for 18.04 is also required.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1887774

Title:
  [UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed
  ERP

Status in Ubuntu on IBM z Systems:
  Triaged
Status in linux package in Ubuntu:
  New
Status in linux source package in Focal:
  New
Status in linux source package in Groovy:
  New

Bug description:
  Description:   zfcp: Fix panic on ERP timeout for previously dismissed ERP
  Symptom:   Linux kernel panic due to kernel page fault in IRQ context
 when running zfcp_erp_timeout_handler() calling
 zfcp_erp_notify().
  Problem:   Suppose that, for unrelated reasons, FSF requests on behalf
 of recovery are very slow and can run into the ERP timeout.
 In the case at hand, we did adapter recovery to a large
 degree. However due to the slowness a LUN open is pending so
 the corresponding fc_rport remains blocked. After
 fast_io_fail_tmo we trigger close physical port recovery for
 the port under which the LUN should have been opened. The
 new higher order port recovery dismisses the pending LUN
 open ERP action and dismisses the pending LUN open FSF
 request. Such dismissal decouples the ERP action from the
 pending corresponding FSF request by setting
 zfcp_fsf_req->erp_action to NULL (among other things)
 [zfcp_erp_strategy_check_fsfreq()].
 If now the ERP timeout for the pending open LUN request runs
 out, we must not use zfcp_fsf_req->erp_action in the ERP
 timeout handler. This is a problem since v4.15 commit
 75492a51568b ("s390/scsi: Convert timers to use
 timer_setup()"). Before that we intentionally only passed
 zfcp_erp_action as context argument to
 zfcp_erp_timeout_handler().
 Note: The lifetime of the corresponding zfcp_fsf_req object
 continues until a (late) response or an (unrelated) adapter
 recovery.
  Solution:  Just like the regular response path ignores dismissed
 requests [zfcp_fsf_req_complete() =>
 zfcp_fsf_protstatus_eval() => return early] the ERP timeout
 handler now needs to ignore dismissed requests. So simply
 return early in the ERP timeout handler if the FSF request
 is marked as dismissed in its status flags. To protect
 against the race where zfcp_erp_strategy_check_fsfreq()
 dismisses and sets zfcp_fsf_req->erp_action to NULL after
 our previous status flag check, return early if
 zfcp_fsf_req->erp_action is NULL. After all, the former ERP
 action does not need to be woken up as that was already done
 as part of the dismissal above [zfcp_erp_action_dismiss()].

  Upstream-ID:   936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8

  Will be integrated by kernel 5.8 by groovy.

  Please check that this also be integrated into 20.04

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887774/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp