SRU Justification: Impact:
This issue can cause system panics of systems using the virtio_scsi driver with the affected Ubuntu kernels. The issue manifests irregularly, as it is timing dependent. Fix: The issue is resolved by adding synchronization between the two code paths that race with one another. The most straightforward fix is to have the code wait for any outstanding requests to drain prior to freeing the target structure, e.g., --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -762,6 +762,10 @@ static int virtscsi_target_alloc(struct scsi_target *starget) static void virtscsi_target_destroy(struct scsi_target *starget) { struct virtio_scsi_target_state *tgt = starget->hostdata; + + /* we can race with concurrent virtscsi_complete_cmd */ + while (atomic_read(&tgt->reqs)) + cpu_relax(); kfree(tgt); } An alternative fix that was considered is to use a synchronize_rcu_expedited call, as that is the functionality that blocks the race in unaffected kernels. However, some call paths into virtscsi_target_destroy may hold mutexes that are not held by the upstream RCU sync calls (which enter via the block layer). For this reason the more confined fix described above was chosen. Testcase: This reproduces on Google Cloud, using the current, unmodified ubuntu-1404-lts image (with the Ubuntu 4.4 kernel). Using the two attached scripts, run e.g. ./create_shutdown_instance.sh 100 to create 100 instances. If an instance runs its startup script successfully, it'll shut itself down right away. So instances that are still running after a few minutes likely demonstrate this problem. The issue reproduces easily with n1-standard-4. create_shutdown_instance.sh: #!/bin/bash -e ZONE=us-central1-a for i in $(seq -w $1); do gcloud compute instances create shutdown-experiment-$i \ --zone="${ZONE}" \ --image-family=ubuntu-1404-lts \ --image-project=ubuntu-os-cloud \ --machine-type=n1-standard-4 \ --scopes compute-rw \ --metadata-from-file startup-script=immediate_shutdown.sh & done wait immediate_shutdown.sh: #!/bin/bash -x function get_metadata_value() { curl -H 'Metadata-Flavor: Google' \ "http://metadata.google.internal/computeMetadata/v1/instance/$1" } readonly ZONE="$(get_metadata_value zone | awk -F'/' '{print $NF}')" gcloud compute instances delete "$(hostname)" --zone="${ZONE}" --quiet -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1765241 Title: virtio_scsi race can corrupt memory, panic kernel Status in linux package in Ubuntu: Confirmed Bug description: There's a race in the virtio_scsi driver (for all kernels) That race is inadvertently avoided on most kernels due to a synchronize_rcu call coincidentally placed in one of the racing code paths By happenstance, the set of patches backported to the Ubuntu 4.4 kernel ended up without a synchronize_rcu in the relevant place. The issue first manifests with commit be2a20802abbde04ae09846406d7b670731d97d2 Author: Jan Kara <j...@suse.cz> Date: Wed Feb 8 08:05:56 2017 +0100 block: Move bdi_unregister() to del_gendisk() BugLink: http://bugs.launchpad.net/bugs/1659111 The race can cause a kernel panic due to corruption of a freelist pointer in a slab cache. The panics occur in arbitrary places as the failure occurs at an allocation after the corruption of the pointer. However, the most common failure observed has been within virtio_scsi itself during probe processing, e.g.: [ 3.111628] [<ffffffff811b0f32>] kfree_const+0x22/0x30 [ 3.112340] [<ffffffff813db534>] kobject_release+0x94/0x190 [ 3.113126] [<ffffffff813db3c7>] kobject_put+0x27/0x50 [ 3.113838] [<ffffffff8153dee7>] put_device+0x17/0x20 [ 3.114568] [<ffffffff815ac6b2>] __scsi_remove_device+0x92/0xe0 [ 3.115401] [<ffffffff815a928b>] scsi_probe_and_add_lun+0x95b/0xe80 [ 3.116287] [<ffffffff811f1083>] ? kmem_cache_alloc_trace+0x183/0x1f0 [ 3.117227] [<ffffffff8154eb0b>] ? __pm_runtime_resume+0x5b/0x80 [ 3.118048] [<ffffffff815a9eaa>] __scsi_scan_target+0x10a/0x690 [ 3.118879] [<ffffffff815aa59e>] scsi_scan_channel+0x7e/0xa0 [ 3.119653] [<ffffffff815aa743>] scsi_scan_host_selected+0xf3/0x160 [ 3.120506] [<ffffffff815aa83d>] do_scsi_scan_host+0x8d/0x90 [ 3.121295] [<ffffffff815aaa0c>] do_scan_async+0x1c/0x190 [ 3.122048] [<ffffffff810a5748>] async_run_entry_fn+0x48/0x150 [ 3.122846] [<ffffffff8109c6b5>] process_one_work+0x165/0x480 [ 3.123732] [<ffffffff8109ca1b>] worker_thread+0x4b/0x4d0 [ 3.124508] [<ffffffff8109c9d0>] ? process_one_work+0x480/0x480 Details on the race: CPU A: virtscsi_probe [...] __scsi_scan_target scsi_probe_and_add_lun [on return calls __scsi_remove_device, below] scsi_probe_lun [...] blk_execute_rq blk_execute_rq waits for the completion event, and then on wakeup returns up to scsi_probe_and_all_lun, which calls __scsi_remove_device. In order for the race to occur, the wakeup must occur on a CPU other than CPU B. After being woken up by the completion of the request, the call returns up the stack to scsi_probe_and_add_lun, which calls __scsi_remove_device: __scsi_remove_device blk_cleanup_queue [ no longer calls bdi_unregister ] scsi_target_reap(scsi_target(sdev)) scsi_target_reap_ref_put kref_put kref_sub scsi_target_reap_ref_release scsi_target_destroy ->target_destroy() = virtscsi_target_destroy kfree(tgt) <=== FREE TGT Note that the removal of the call to bdi_unregister in commit xenial be2a20802abbde block: Move bdi_unregister() to del_gendisk() and annotated above is the change that gates whether the issue manifests or not. The other code change from be2a20802abbde has no effect on the race. CPU B: vring_interrupt virtscsi_complete_cmd scsi_mq_done (via ->scsi_done()) scsi_mq_done blk_mq_complete_request __blk_mq_complete_request [...] blk_end_sync_rq complete [ wake up the task from CPU A ] After waking the CPU A task, execution returns up the stack, and then calls atomic_dec(&tgt->reqs) in virtscsi_complete_cmd immediately after returning from the call to ->scsi_done. If the activity on CPU A after it is woken up (starting at __scsi_remove_device) finishes before CPU B can call atomic_dec() in virtscsi_complete_cmd, then the atomic_dec() will modify a free list pointer in the freed slab object that contained tgt. This causes the system to panic on a subsequent allocation from the per-cpu slab cache. The call path on CPU B is significantly shorter than that on CPU A after wakeup, so it is likely that an external event delays CPU B. This could be either an interrupt within the VM or scheduling delays for the virtual cpu process on the hypervisor. Whatever the delay is, it is not the root cause but merely the triggering event. The virtscsi race window described above exists in all kernels that have been checked (4.4 upstream LTS, Ubuntu 4.13 and 4.15, and current mainline as of this writing). However, none of those kernels exhibit the panic in testing, only the Ubuntu 4.4 kernel after commit be2a20802abbde. The reason none of those kernels panic is they all have one thing in common: an incidental call to synchronize_rcu somewhere in the call path of CPU A after it is woken up. This causes CPU A to wait for CPU B's activity, as CPU A will not go on to free the "tgt" memory until after the RCU grace period passes, which requires waiting for CPU B's activity to finish. Note that the specific RCU sync call is different between some of those kernel versions, but all of them have one somewhere deep inside blk_cleanup_queue. The bdi_unregister function has one (in the call to bdi_remove_from_list), which is why removing that call opens the race window on the Ubuntu 4.4 kernel. Resolving the issue can be accomplished by adding an RCU sync to virtscsi_target_destroy prior to freeing the target. It is also possible to use a loop of the format: + while (atomic_read(&tgt->reqs)) + cpu_relax(); but this is higher risk as the loop is non-terminating in the case of other failure. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765241/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp