You have been subscribed to a public bug:

Description:   zfcp: Fix panic on ERP timeout for previously dismissed ERP
Symptom:       Linux kernel panic due to kernel page fault in IRQ context
               when running zfcp_erp_timeout_handler() calling
               zfcp_erp_notify().
Problem:       Suppose that, for unrelated reasons, FSF requests on behalf
               of recovery are very slow and can run into the ERP timeout.
               In the case at hand, we did adapter recovery to a large
               degree. However due to the slowness a LUN open is pending so
               the corresponding fc_rport remains blocked. After
               fast_io_fail_tmo we trigger close physical port recovery for
               the port under which the LUN should have been opened. The
               new higher order port recovery dismisses the pending LUN
               open ERP action and dismisses the pending LUN open FSF
               request. Such dismissal decouples the ERP action from the
               pending corresponding FSF request by setting
               zfcp_fsf_req->erp_action to NULL (among other things)
               [zfcp_erp_strategy_check_fsfreq()].
               If now the ERP timeout for the pending open LUN request runs
               out, we must not use zfcp_fsf_req->erp_action in the ERP
               timeout handler. This is a problem since v4.15 commit
               75492a51568b ("s390/scsi: Convert timers to use
               timer_setup()"). Before that we intentionally only passed
               zfcp_erp_action as context argument to
               zfcp_erp_timeout_handler().
               Note: The lifetime of the corresponding zfcp_fsf_req object
               continues until a (late) response or an (unrelated) adapter
               recovery.
Solution:      Just like the regular response path ignores dismissed
               requests [zfcp_fsf_req_complete() =>
               zfcp_fsf_protstatus_eval() => return early] the ERP timeout
               handler now needs to ignore dismissed requests. So simply
               return early in the ERP timeout handler if the FSF request
               is marked as dismissed in its status flags. To protect
               against the race where zfcp_erp_strategy_check_fsfreq()
               dismisses and sets zfcp_fsf_req->erp_action to NULL after
               our previous status flag check, return early if
               zfcp_fsf_req->erp_action is NULL. After all, the former ERP
               action does not need to be woken up as that was already done
               as part of the dismissal above [zfcp_erp_action_dismiss()].

Upstream-ID:   936e6b85da0476dd2edac7c51c68072da9fb4ba2 -> kernel 5.8

Will be integrated by kernel 5.8 by groovy.

Please check that this also be integrated into 20.04

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Skipper Bug Screeners (skipper-screen-team)
         Status: New


** Tags: architecture-s39064 bugnameltc-186883 severity-high 
targetmilestone-inin20041
-- 
[UBUNTU 20.04] zfcp: Fix panic on ERP timeout for previously dismissed ERP
https://bugs.launchpad.net/bugs/1887774
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to