Ian Molton wrote: > Avi Kivity wrote: > > > Init is pretty easy to handle. I'm worried about runtime where you > > can't report an error to the guest. Real hardware doesn't oom. > > In the case of the socket reconnect code I posted recently, if the > allocation failed, it would give up trying to reconnect and inform the > user of that chardev that it had closed. Ok, this doesnt help the guest, > but it allows other code to clean up nicely, and we can report the > failure to the host. IMHO thats better than leaving a sysadmin > scratching their head wondering why it suddenly just stopped feeding the > guest entropy and isnt trying to reconnect anymore...
If the system as a whole runs out of memory so that no-overcommit malloc() fails on a small alloc, there's a good chance that you won't be able to send a message to the host (how do you format the QMP message without malloc?), and if you do manage that, there's a good chance the host won't be able to receive it (it can't malloc either), and if it does manage to receive the message, you can be almost certain that it won't be able to run any GUI operations, send mail, etc. to inform the admin. The chances of the path "qemu small alloc -> chardev error -> send QMP message -> receive QMP message -> parse QMG message -> do something useful (log/email/UI)" having fully preallocated buffers for every step, including a preallocated emergency pool for the buffers used by QMG formatting and parsing, so that it gets all the way past the last step are very slim indeed. There's no point writing the code for the first steps, if it's intractable to make the later steps do something useful. Btw, as an admin I would really rather the socket reconnection code keeps trying in that circumstance, if qemu does not simply fall over due to alloc failing for something else soon after. The most likely scenario, imho in a server like that, is to notice it is running out of memory and kill the real cause (e.g. another runaway process), then restart all daemons which have died. I'm not going to notice a non-fatal message (in the unlikely event it is propagated all the way up) because there are plenty of other non-fatal messages in normal use, multiplied by hundreds of guests (across a cluster). Or, if you mean the chardev closing causes qemu to terminate - what's the difference from the current qemu_malloc() behaviour? I'd rather it behaves like a broken HWRNG if it can't get host entropy: Don't provide data, and let the guest decide what to do, just like it does for a broken HWRNG. Except virtio-rng can report unavailability rather than simply being broken :-) -- Jamie