[ovirt-users] Re: Non responsive host (4.3.10)
Maria - Likely you can repair the gluster volumes on that host with xfs_repair and you most likely need to stop the gluster service on that node and unmount the filesystems to do so. After the filesystem is repaired you'll be able to mount them and start the gluster service. From there gluster will "heal" that node and your replica-3 will be redundant again. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WGQFWEUW6YYBAUU4LXHYB2N57MF4ZMZK/
[ovirt-users] Re: Non responsive host (4.3.10)
Just an update for documentation purposes. I tried physically rebooting the faulty node after placing the cluster in global maintenance mode, as I couldn't place the node in local maintenance. It booted up ok, but then after a few minutes the following logs started appearing on the screen: "blk_update_request: I/O error, dev dm-1, sector 0 blk_update_request: I/O error, dev dm-1, sector 2048 blk_update_request: I/O error, dev dm-1, sector 2099200 EXT4-fs error (device dm-7): ext4_find_entry:1318:inode #6294136: comm python: reading directory lblock 0 EXT4-fs (dm-7): previous I/O error to superblock detected Buffer I/O error on dev dm-7, logical block 0, lost sync page write device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5 blk_update_request: I/O error, dev dm-1, sector 1051168 Aborting journal on device dm-2-0 blk_update_request: I/O error, dev dm-1, sector 1050624 JBD2: Error -5 detected when updating journal superblock for dm-2-0 " From what I can tell the filesystem is corrupted so now I'm in the process of either fixing it with FSCK or replacing the node with a new one. (fyi, the node never changed status and it stayed NonResponsive) For the VM that was stuck on the node, the solution I found was described here https://serverfault.com/questions/996649/how-to-confirm-reboot-unresponsive-host-in-ovirt-if-another-power-management#996650 and it was to set the cluster in global maintenance mode, then shutdown the engine VM and then start it again. It worked perfectly and I was able to start the VM on another node. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYGCLLI3X63KLACNFWCRB7CDYM6TBZKT/
[ovirt-users] Re: Non responsive host (4.3.10)
Thanks for your reply. I haven't tried yet to physically reboot it as I am trying to find if it will mess further the ovirt installation on the other hosts and how the gluster will behave. Should I stop the glusterd service on the other hosts? Should I put the whole cluster in global maintenance? The output on the screen of the host is full of repeating the following: "device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5 blk_update_request: I/O error, dev dm-1, sector 390975192" ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5SEOYD7SVXTFRTFFDNQTO4T6WDR5GCFR/
[ovirt-users] Re: Non responsive host (4.3.10)
Have you tried physically rebooting the host? Plug in a monitor and see what it says? -derek Sent using my mobile device. Please excuse any typos. On May 28, 2023 06:46:12 "Maria Souvalioti" wrote: Hello everyone! Due to a recent major power outage in my area I now have an unresponsive self hosted host in an environment of 3 self hosted hosts. There's one vm stuck on there as well as some metadata I guess from when hosted engine was running there (before the power went down). I'm running 4.3.10 ovirt node with 3 nodes and GlusterFS, no arbiter, and I'm using it to provide services to our clients i.e. DNS, web sites, wikis, ticketing etc. and I cannot shut them down. The ovirt engine is up and running and I can manage all the other VMs that run on the other hosts through the web gui. The unresponsive host replies only to ICMP requests; in every other sense it's dead, no ssh, no gluster bricks, no console, nothing. I tried to place the faulty host in maintenance, using the option to stop glusterd, but wasn't able to as the engine won't let the host go into maintenance mode because it thinks the host has running VMs on it. The host won't go into maintenance even if I chose the "Ignore gluster quorum and self-heal validations" option. I spent last week creating a backup environment were I copied the VMs, to have somewhere to run them in case something goes terribly wrong with the systems or the gluster in the production system. I'm thinking of using the global maintenance mode and then shutting down the engine itself with *hosted-engine --vm-shutdown* and rebooting the affected host. Should I remove the host from the cluster and then re-add it or should I do something else? Thanks for any of your help! ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HWKO7GN3PB6X5WG4MZ67CEAY5FECQLIQ/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/3ZYUTTPPS6AII4U6DEYW3IOLKPUQJRVG/