Patch is applied for both focal and jammy from upstream by Kamal: https://bugs.launchpad.net/bugs/2002347 https://bugs.launchpad.net/bugs/2003122
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2002256 Title: [UBUNTU 22.04] zfcp: fix double free of FSF request when qdio send fails Status in Ubuntu on IBM z Systems: Triaged Status in linux package in Ubuntu: New Status in linux source package in Focal: New Status in linux source package in Jammy: New Status in linux source package in Kinetic: New Bug description: Description: zfcp: fix double free of FSF request when qdio send fails Symptom: When doing maintenance actions on FCP devices that turn off a FCP device while I/O is still running on it in Linux - for example turning off the channel path of the FCP device - the Linux kernel crashes. Problem: We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache the FSF request ID when sending a new FSF request. This is used in case the sending fails and we need to remove the request from our internal hash table again (so we don't keep an invalid reference and use it when we free the request again). In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32 bit wide), but the rest of the zfcp code (and the firmware specification) handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x ELF ABI]). For one this has the obvious problem that when the ID grows past 32 bit (this can happen reasonably fast) it is truncated to 32 bit when storing it in the cache variable and so doesn't match the original ID anymore. The second less obvious problem is that even when the original ID has not yet grown past 32 bit, as soon as the 32nd bit is set in the original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we cast it back to 'unsigned long' because casting the signed type 'int' into the wider type 'unsigned long' will use a sign-extending instruction, and so flip all leading zeros to one instead. If we can't successfully remove the request from the hash table again after 'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify the adapter about new work because the adapter is already gone during e.g. a ChpID toggle) we will end up with a double free. We unconditionally free the request in the calling function when 'zfcp_fsf_req_send()' fails, but because the request is still in the hash table we end up with a stale memory reference, and once the zfcp adapter is either reset during recovery or shutdown we end up freeing the same memory twice. Solution: To fix this, simply change the type of the cache variable to 'unsigned long', like the rest of zfcp and also the argument for 'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension and so can successfully remove the request from the hash table. Reproduction: Run I/O on a FCP device for so long that you have sent 2'147'483'648 requests. The current request number can not be read directly from user space, but can be read indirectly by using 'zfcp_ping' and 'zfcpdbf' (use the correct device-bus-ID): sudo sh -c 'zfcp_ping -a "${0}" 0xFFFFFFFFFFFFFFFF \ 2>/dev/null 1>&2; zfcpdbf "${0}" -x all -i SAN 2>/dev/null \ | grep -E -e "^(Timestamp|Request ID)[[:blank:]]+:" | tail \ -n2' 0.0.1700 After having reached 0x80000000 requests, stop all I/O on the FCP device and start only a single process doing single-threaded synchronous, direct I/O on the FCP device (always only one outstanding I/O operation). While this I/O process is running, turn of the channel path (ChpID) that is used for the FCP device/subchannel. This will not always trigger the bug, but occasionally it will. Proof that it hit the correct code-path in zfcp can be found by using 'zfcpdbf' again (use the correct device-bus-ID): zfcpdbf 0.0.1700 -x all -i REC 2>/dev/null | grep 'fsrs__1' In case you hit the correct code-path this will print some lines starting with 'Tag'. Upstream-ID: 0954256e970ecf371b03a6c9af2cf91b9c4085ff Preventive: yes Date: 2022-12-19 Author: Benjamin Block <bbl...@linux.ibm.com> Component: kernel Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0954256e970e To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2002256/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp