Re: vmm/vmd disk issue
So to conclude. I have done four parallel dd, cp, cmp on the host without any error showing up. Ian Darwin wrote: > Depending on where the error is, you might get away with > dd'ing with conv=noerror,sync, changing vm.conf to point > to the new copy, and run fsck in the vm. After this the vm would no longer freeze but an important config file was missing so I would not trust the state of the machine for anything else than maybe keeping it alive a few days until there is a better time to reinstall. Dave Voutila wrote: > Have you run fsck(8) on your host? Complere fsck of the host in single user mode showed no problem at all. > I'd say maybe make sure you have backups of anything important > first if you're purposely going to break things. :-) Always! :) So for now I will just let it be and see what time gives. Thank you all for your input!
Re: vmm/vmd disk issue
On Tue, Mar 09, 2021 at 11:20:30PM +0100, Jan Johansson wrote: > Mike Larkin wrote: > > On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: > > > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: > > > > If I try to cp or dd the disk image on the host it fails > > > > > > > > dd if=disk.raw.old of=disk.raw.bak bs=1m > > > > dd: disk.raw.old: Input/output error > > > > 8858+0 records in > > > > 8858+0 records out > > > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) > > > > > > > > The host show no other signs of failing hardware. > > > > > > > > Is this a software or a hardware error? > > > > > > Given that it gives an error outside the VM, it's likely hardware. > > > > > > > Agreed. Sorta hard to fault vmd(8) if it's not even running. > > Since these are sparse files, could the vioblk(4) somehow write > incorrect data that later will make it unreadable such as a > pointer pointing into nothingness? > no > The messages > > vmd[39543]: vioblk write error: Input/output error > vmd[39543]: wr vioblk: disk write error > > was produced and 01:30 when all the 4 guests and the host all run > the daily script (which makes backup and other maintenance tasks) > if that could have any impact. > > Should there not be anything on the host logging errors to > dmesg/syslog such as sd(4) or ahci(4)? > > (If it is not obvious my understanding of how the virtio/vioblk > stuff hooks in to the disk stack is very limited) > > This drive was installed in august 2020 and if I recall correctly > it was because of this issue. So I am thinkig cable or > motherboard. > > If I decide to replace would it make sense to make this a > softraid mirror (RAID1) to avoid or get better indication of this > kind of problems in the future or would only add more parts that > can break? > > I'am currently trying to provoke the drive from the host with > > dd if=/dev/random of=test.raw bs=1m count=17000 > > then cp/dd and cmp to see if I can make it break for real. >
Re: vmm/vmd disk issue
Jan Johansson writes: > Mike Larkin wrote: >> On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: >> > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: >> > > If I try to cp or dd the disk image on the host it fails >> > > >> > > dd if=disk.raw.old of=disk.raw.bak bs=1m >> > > dd: disk.raw.old: Input/output error >> > > 8858+0 records in >> > > 8858+0 records out >> > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) >> > > >> > > The host show no other signs of failing hardware. >> > > >> > > Is this a software or a hardware error? >> > >> > Given that it gives an error outside the VM, it's likely hardware. >> > >> >> Agreed. Sorta hard to fault vmd(8) if it's not even running. > > Since these are sparse files, could the vioblk(4) somehow write > incorrect data that later will make it unreadable such as a > pointer pointing into nothingness? > > The messages > > vmd[39543]: vioblk write error: Input/output error > vmd[39543]: wr vioblk: disk write error > > was produced and 01:30 when all the 4 guests and the host all run > the daily script (which makes backup and other maintenance tasks) > if that could have any impact. > > Should there not be anything on the host logging errors to > dmesg/syslog such as sd(4) or ahci(4)? > > (If it is not obvious my understanding of how the virtio/vioblk > stuff hooks in to the disk stack is very limited) > vmd(8) reads/writes to the disk image files (both raw and qcow2) using pread(2)/pwrite(2) calls. The qcow2 handling is a bit more complex, but they're still just calling pread/pwrite as far as I'm aware. Have you run fsck(8) on your host? > This drive was installed in august 2020 and if I recall correctly > it was because of this issue. So I am thinkig cable or > motherboard. > > If I decide to replace would it make sense to make this a > softraid mirror (RAID1) to avoid or get better indication of this > kind of problems in the future or would only add more parts that > can break? > > I'am currently trying to provoke the drive from the host with > > dd if=/dev/random of=test.raw bs=1m count=17000 > > then cp/dd and cmp to see if I can make it break for real. I'd say maybe make sure you have backups of anything important first if you're purposely going to break things. :-) -- -Dave Voutila
Re: vmm/vmd disk issue
It maybe possible that disk IO is saturated. (i.e. more writes than the physical disk could handle). -Original Message- From: owner-m...@openbsd.org On Behalf Of Jan Johansson Sent: Wednesday, 10 March 2021 6:21 AM To: misc@openbsd.org Cc: Mike Larkin ; Ian Darwin Subject: Re: vmm/vmd disk issue Mike Larkin wrote: > On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: > > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: > > > If I try to cp or dd the disk image on the host it fails > > > > > > dd if=disk.raw.old of=disk.raw.bak bs=1m > > > dd: disk.raw.old: Input/output error > > > 8858+0 records in > > > 8858+0 records out > > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) > > > > > > The host show no other signs of failing hardware. > > > > > > Is this a software or a hardware error? > > > > Given that it gives an error outside the VM, it's likely hardware. > > > > Agreed. Sorta hard to fault vmd(8) if it's not even running. Since these are sparse files, could the vioblk(4) somehow write incorrect data that later will make it unreadable such as a pointer pointing into nothingness? The messages vmd[39543]: vioblk write error: Input/output error vmd[39543]: wr vioblk: disk write error was produced and 01:30 when all the 4 guests and the host all run the daily script (which makes backup and other maintenance tasks) if that could have any impact. Should there not be anything on the host logging errors to dmesg/syslog such as sd(4) or ahci(4)? (If it is not obvious my understanding of how the virtio/vioblk stuff hooks in to the disk stack is very limited) This drive was installed in august 2020 and if I recall correctly it was because of this issue. So I am thinkig cable or motherboard. If I decide to replace would it make sense to make this a softraid mirror (RAID1) to avoid or get better indication of this kind of problems in the future or would only add more parts that can break? I'am currently trying to provoke the drive from the host with dd if=/dev/random of=test.raw bs=1m count=17000 then cp/dd and cmp to see if I can make it break for real. Classified as Confidential
Re: vmm/vmd disk issue
Mike Larkin wrote: > On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: > > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: > > > If I try to cp or dd the disk image on the host it fails > > > > > > dd if=disk.raw.old of=disk.raw.bak bs=1m > > > dd: disk.raw.old: Input/output error > > > 8858+0 records in > > > 8858+0 records out > > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) > > > > > > The host show no other signs of failing hardware. > > > > > > Is this a software or a hardware error? > > > > Given that it gives an error outside the VM, it's likely hardware. > > > > Agreed. Sorta hard to fault vmd(8) if it's not even running. Since these are sparse files, could the vioblk(4) somehow write incorrect data that later will make it unreadable such as a pointer pointing into nothingness? The messages vmd[39543]: vioblk write error: Input/output error vmd[39543]: wr vioblk: disk write error was produced and 01:30 when all the 4 guests and the host all run the daily script (which makes backup and other maintenance tasks) if that could have any impact. Should there not be anything on the host logging errors to dmesg/syslog such as sd(4) or ahci(4)? (If it is not obvious my understanding of how the virtio/vioblk stuff hooks in to the disk stack is very limited) This drive was installed in august 2020 and if I recall correctly it was because of this issue. So I am thinkig cable or motherboard. If I decide to replace would it make sense to make this a softraid mirror (RAID1) to avoid or get better indication of this kind of problems in the future or would only add more parts that can break? I'am currently trying to provoke the drive from the host with dd if=/dev/random of=test.raw bs=1m count=17000 then cp/dd and cmp to see if I can make it break for real.
Re: vmm/vmd disk issue
On Tue, Mar 09, 2021 at 09:38:57AM -0500, Ian Darwin wrote: > On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: > > If I try to cp or dd the disk image on the host it fails > > > > dd if=disk.raw.old of=disk.raw.bak bs=1m > > dd: disk.raw.old: Input/output error > > 8858+0 records in > > 8858+0 records out > > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) > > > > The host show no other signs of failing hardware. > > > > Is this a software or a hardware error? > > Given that it gives an error outside the VM, it's likely hardware. > Agreed. Sorta hard to fault vmd(8) if it's not even running. > > Is there some way to recover the guest disk image without a > > complete reinstall? > > Depending on where the error is, you might get away with > dd'ing with conv=noerror,sync, changing vm.conf to point > to the new copy, and run fsck in the vm. > > And buy a new hard disk or SDD. Probably cheaper than your time > to further diagnose it? >
Re: vmm/vmd disk issue
On Tue, Mar 09, 2021 at 09:52:03AM +0100, Jan Johansson wrote: > If I try to cp or dd the disk image on the host it fails > > dd if=disk.raw.old of=disk.raw.bak bs=1m > dd: disk.raw.old: Input/output error > 8858+0 records in > 8858+0 records out > 9288286208 bytes transferred in 102.048 secs (91018010 bytes/sec) > > The host show no other signs of failing hardware. > > Is this a software or a hardware error? Given that it gives an error outside the VM, it's likely hardware. > Is there some way to recover the guest disk image without a > complete reinstall? Depending on where the error is, you might get away with dd'ing with conv=noerror,sync, changing vm.conf to point to the new copy, and run fsck in the vm. And buy a new hard disk or SDD. Probably cheaper than your time to further diagnose it?