I’m not much further with my segfault, though I now know that the number of 
detaches likely does not matter and it seems to occur during the attach, not 
the detach part of the code.

I adapted my change to be a bit more sane - I think it might make sense in 
general, as something is clearly wrong, the code can be reached somehow and in 
this case we probably just want to stop, instead of pretending everything is 
okay.

So the following change also works for us, causing no segfaults:

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index efee6739f9..7273cd6c3d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -775,6 +775,15 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
         return -1;
     }
 
+    /* Avoid null-pointers leading to segfaults below */
+    if (!s->version) {
+        return -1;
+    }
+
+    if (!s->vendor) {
+        return -1;
+    }
+
     /* PAGE CODE == 0 */
     buflen = req->cmd.xfer;
     if (buflen > SCSI_MAX_INQUIRY_LEN) {

I still hope to get some feedback from anyone that is familiar with hw/scsi. 
Hopefully this reaches someone who can shed some light on this.

Cheers and enjoy your weekend,

Denis

> On 9 Aug 2022, at 18:51, Peter Maydell <peter.mayd...@linaro.org> wrote:
> 
> On Tue, 9 Aug 2022 at 17:26, Denis Krienbühl <de...@href.ch 
> <mailto:de...@href.ch>> wrote:
>> On 9 Aug 2022, at 18:15, Peter Maydell <peter.mayd...@linaro.org> wrote:
>>> My wild guess is that there's a race condition somewhere such
>>> that when you're doing this huge amount of detaches, very rarely
>>> a disk is detached and deleted but this INQUIRY request is
>>> incorrectly still sent to the disk (which being a freed object,
>>> might be overwritten with other stuff). But that is purely a guess.
>> 
>> So.. should this be something I create a bug report for?
>> 
>> 
>>> If you can repro this on current head-of-git, or at least on
>>> the most recent release, then yes, file a bug report.
> 
>> The best I can currently do is start to log what’s going on. Since
>> I’m not at all familiar with SCSI and this code-base, do you have
>> any tipps on what I should log to maybe find out where this
>> race-condition occurs?
>> 
>> Or if there’s any kind of documentation I could read to understand
>> better what is going on in the hw/scsi subsystem and how I should
>> navigate the code. After reading your explanation we’ll probably
>> look for other workarounds, but I would love to understand what’s
>> going on.
> 
> Paolo and Fam are the SCSI subsystem maintainers. They might know
> whether this sounds like a bug that's already been fixed at some
> point, or have other suggestions.
> 
> Context (ie link to the start of this thread on the list archive):
> https://lists.gnu.org/archive/html/qemu-discuss/2022-08/msg00011.html 
> <https://lists.gnu.org/archive/html/qemu-discuss/2022-08/msg00011.html>
> 
> thanks
> -- PMM

Reply via email to