[ovirt-users] Re: Non responsive host (4.3.10)

2023-06-08 Thread Clint Boggio
Maria -

Likely you can repair the gluster volumes on that host with xfs_repair and you 
most likely need to stop the gluster service on that node and unmount the 
filesystems to do so.

After the filesystem is repaired you'll be able to mount them and start the 
gluster service. From there gluster will "heal" that node and your replica-3 
will be redundant again.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WGQFWEUW6YYBAUU4LXHYB2N57MF4ZMZK/


[ovirt-users] Re: Non responsive host (4.3.10)

2023-06-06 Thread Maria Souvalioti
Just an update for documentation purposes.

I tried physically rebooting the faulty node after placing the cluster in 
global maintenance mode, as I couldn't place the node in local maintenance. It 
booted up ok, but then after a few minutes the following logs started appearing 
on the screen:

"blk_update_request: I/O error, dev dm-1, sector 0
blk_update_request: I/O error, dev dm-1, sector 2048
blk_update_request: I/O error, dev dm-1, sector 2099200
EXT4-fs error (device dm-7): ext4_find_entry:1318:inode #6294136: comm python: 
reading directory lblock 0
EXT4-fs (dm-7): previous I/O error to superblock detected
Buffer I/O error on dev dm-7, logical block 0, lost sync page write
device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5
blk_update_request: I/O error, dev dm-1, sector 1051168
Aborting journal on device dm-2-0
blk_update_request: I/O error, dev dm-1, sector 1050624
JBD2: Error -5 detected when updating journal superblock for dm-2-0
"

From what I can tell the filesystem is corrupted so now I'm in the process of 
either fixing it with FSCK or replacing the node with a new one. (fyi, the node 
never changed status and it stayed NonResponsive)

For the VM that was stuck on the node, the solution I found was described here 
https://serverfault.com/questions/996649/how-to-confirm-reboot-unresponsive-host-in-ovirt-if-another-power-management#996650
 and it was to set the cluster in global maintenance mode, then shutdown the 
engine VM and then start it again. It worked perfectly and I was able to start 
the VM on another node.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RYGCLLI3X63KLACNFWCRB7CDYM6TBZKT/


[ovirt-users] Re: Non responsive host (4.3.10)

2023-05-28 Thread Maria Souvalioti
Thanks for your reply. 

I haven't tried yet to physically reboot it as I am trying to find if it will 
mess further the ovirt installation on the other hosts and how the gluster will 
behave. Should I stop the glusterd service on the other hosts? Should I put the 
whole cluster in global maintenance? 

The output on the screen of the host is full of repeating the following:

"device-mapper: thin: process_cell: dm_thin_find_block() failed: error= -5

blk_update_request: I/O error, dev dm-1, sector 390975192"
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5SEOYD7SVXTFRTFFDNQTO4T6WDR5GCFR/


[ovirt-users] Re: Non responsive host (4.3.10)

2023-05-28 Thread Derek Atkins
Have you tried physically rebooting the host?  Plug in a monitor and see 
what it says?

-derek
Sent using my mobile device. Please excuse any typos.
On May 28, 2023 06:46:12 "Maria Souvalioti"  wrote:


Hello everyone!

Due to a recent major power outage in my area I now have an unresponsive 
self hosted host in an environment of 3 self hosted hosts. There's one vm 
stuck on there as well as some metadata I guess from when hosted engine was 
running there (before the power went down).


I'm running 4.3.10 ovirt node with 3 nodes and GlusterFS, no arbiter, and 
I'm using it to provide services to our clients i.e. DNS, web sites, wikis, 
ticketing etc. and I cannot shut them down.


The ovirt engine is up and running and I can manage all the other VMs that 
run on the other hosts through the web gui.


The unresponsive host replies only to ICMP requests; in every other sense 
it's dead, no ssh, no gluster bricks, no console, nothing.


I tried to place the faulty host in maintenance, using the option to stop 
glusterd, but wasn't able to as the engine won't let the host go into 
maintenance mode because it thinks the host has running VMs on it. The host 
won't go into maintenance even if I chose the "Ignore gluster quorum and 
self-heal validations" option.


I spent last week creating a backup environment were I copied the VMs, to 
have somewhere to run them in case something goes terribly wrong with the 
systems or the gluster in the production system.


I'm thinking of using the global maintenance mode and then shutting down 
the engine itself with *hosted-engine --vm-shutdown* and rebooting the 
affected host.


Should I remove the host from the cluster and then re-add it or should I do 
something else?


Thanks for any of your help!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HWKO7GN3PB6X5WG4MZ67CEAY5FECQLIQ/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3ZYUTTPPS6AII4U6DEYW3IOLKPUQJRVG/