On Sat, May 08, 2021 at 04:57:54PM +1200, Thomas Munro wrote: > On Sat, May 8, 2021 at 2:30 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11 > > That's -EAGAIN (assuming errnos match x86) and I guess it indicates > that VDC_MAX_RETRIES is exceeded here: > > https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L451 > https://github.com/torvalds/linux/blob/master/drivers/block/sunvdc.c#L526 > > One theory is that the hypervisor/host is occasionally too swamped to > service the request queue fast enough over a ~10ms period, given that > vio_ldc_send() itself retries 1000 times with a 1us sleep, the outer > loop tries ten times, and ldc.c's write_nonraw() reports -EAGAIN when > there is no space for the message. (Alternatively, it's trying to > send a message that's too big for the channel, the channel is > corrupted by bugs, or my fly-by of this code I'd never heard of before > now is just way off...)
Nice discovery. From https://github.com/torvalds/linux/commit/a11f6ca9aef989b56cd31ff4ee2af4fb31a172ec I see those details are 2.5 years old, somewhat young relative to the driver as a whole. I don't know which part should change, though.