[ovirt-users] Re: Broke my GlusterFS somehow
I'm glad you made it work! My main lesson from oVirt from the last two years is: It's not a turnkey solution. Unless you are willing to dive deep and understand how it works (not so easy, because there is few up-to-date materials to explain the concepts) *AND* spend a significant amount of time to run it through its paces, you may not survive the first thing that goes wrong. And if you do, that may be even worse: Trying to fix a busy production farm that has a core problem is stuff for nightmares. So make sure you have a test environment or two to try things. And it doesn't have to be physical, especially for the more complex things, that require a lot of restart-from-scratch before you do the thing that breaks things. You can run oVirt on oVirt or any other hypervisor that supports nested virtualization and play with that before you deal with a "real patient". I've gone through through catastrophic hardware failures, which would have resulted in in a total loss without the resilience oVirt HCI with Gluster replicas provided, but I've also lost a lot of hair and nails just running it. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SDBFJNIAMKEH4BSVI6KEHZQKXQX2L4KV/
[ovirt-users] Re: Broke my GlusterFS somehow
Rule number 1 with gluster is 'always check on the cli'...Glustereventsd.service sends even notifications to oVirt but if it doesn't work the engine won't identify any issues. By the way check if glusterventsd is working fine. Best Regards,Strahil Nikolov On Mon, Feb 21, 2022 at 21:32, Abe E wrote: Thats what I did, rebuilt the Ovirt. Im learning alot so I dont mind but I want to make sure its built and designed well by me before I ever let anyone get hands on it from my colleagues as I inherited an old system that was never maintained and its been a lot to figure out how they designed it with no documentation so its better to start clean. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VEFJ57EXQU5ZNF2WWHU25IK2ASFWQQLK/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YSFV33FW2OZVM7D3T2WEN65HPUTDNJ3Q/
[ovirt-users] Re: Broke my GlusterFS somehow
Thats what I did, rebuilt the Ovirt. Im learning alot so I dont mind but I want to make sure its built and designed well by me before I ever let anyone get hands on it from my colleagues as I inherited an old system that was never maintained and its been a lot to figure out how they designed it with no documentation so its better to start clean. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VEFJ57EXQU5ZNF2WWHU25IK2ASFWQQLK/
[ovirt-users] Re: Broke my GlusterFS somehow
I think Patrick already gave quite sound advice. I'd only want to add, that you should strictly separate dealing with Gluster and oVirt: the integration isn't strong and oVirt just uses Gluster and won't try to fix it intelligently. Changing hostnames on an existing Gluster is "not supported" I guess, even if I understand how that can be a need in real life. I've had occasion to move HCI clusters from one IP range to another and it's a bit like open heart surgery. I had no control over the DNS in that environment so I made do with /etc/hosts on all Gluster members. It's stone age, but it works and you have full control. So you are basically free to use the old hostnames in /etc/hosts to ensure that the gluster pool members are able to talk to each other, while all outside access is with the new names. If you can get the management engine to run somehow, you can use the trick of using the old aliases in the host file, too, to regain operations. In some cases I've even worked with pure IP addresses for the Gluster setup and even that can be all changed in the /var/lib configuration files if and only if all Gluster daemons are shut down (they tend to keep their state in memory and write it down as the daemon finishes). Once Gluster itself is happy and "gluster volume status all" is showing "y" on all ports and bricks, oVirt generally had no issues at all using the storage. It may just take a long time to show things as ok. The only other piece of advice I can give for a situation like that is to decide where your value is and how quickly you need to be back in business. If it's the VMs running on the infra or the oVirt setup itself? If it's the VMs and if you're still running on one Gluster leg, I'd concentrate on saving the VMs. Backup and Export domains on NFS are safe in the sense, that they can typically be attached to an oVirt that you rebuilt from scratch, so that's one option. OVA exports sometimes work and I've also used Clonezilla to copy VMs across to other hypervisors, booting the Clonezilla ISO at both ends and doing a network transfer: they lose some attributs and the network may need to be redone afterwards, but the data stays safe. If it's the oVirt setup, I'd rather recommend starting from scratch with the latest release and hopefully some backup of the VMs. Fiddling with the database is nothing I'd recommend. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KP6YTX2U5OUONAILDIACCA3XMNC7B2YH/
[ovirt-users] Re: Broke my GlusterFS somehow
* gluster volume info all On Sun, Feb 20, 2022 at 14:46, Strahil Nikolov wrote: In lrder to have an idea how to help you provide the following from all nodes (separate the info per node): ip a sgluster pool listgluster peer statusgluster volume listgluster volume status allgluster volume all Best Regards,Strahil Nikolov On Sun, Feb 20, 2022 at 7:14, Patrick Hibbs wrote: OK, where to begin. As for your Gluster issue, Gluster maintains it's own copy of the configuration for each brick outside of oVirt / VDSM. As you have changed the network config manually, you also needed to change the Gluster config to match as well. The fact that you haven't is the reason why Gluster failed to restart the volume. However, In a hyperconverged configuration, oVirt maintains the gluster configuration in it's database. Manually fixing Gluster's configuration on the bricks themselves won't fix the engine's copy. (Believe me, I had to fix this before myself because I didn't use hostnames initially for the bricks. It's a pain to manually fix the database.) That copy is used to connect the VM's to their storage. If the engine's copy doesn't match Gluster's config, you'll have a working Gluster volume but the hosts won't be able to start VMs. Essentially, in a hyperconverged configuration oVirt doesn't allow removal of host with a Gluster brick unless removal of that host won't break Gluster and prevent the volume from running. (I.e. you can't remove a host if doing so would cause the volume to loose quorum.) Your options for fixing Gluster are either: 1. Add enough new bricks to the Gluster volumes so that removal of an old host (brick) doesn't cause quorum loss. - OR - 2. Manually update the engine's database with the engine and all hosts offline to point to the correct hosts, after manually updating the bricks and bringing back up the volume. The first option is your safest bet. But that assumes that the volume is up and can accept new bricks in the first place. If not, you could potentially still do the first option but it would require reverting your network configuration changes on each host first. The second option is one of last resort. This is the reason why I said updating the interfaces manually instead of using the web interface was a bad idea. If possible, use the first option. If not, you'd be better off just hosing the oVirt installation and reinstalling from scratch. If you *really* need to use the second option, you'll need to follow these instructions on each brick: https://serverfault.com/questions/631365/rename-a-glusterfs-peer and then update the engine database manually to point to the correct hostnames for each brick. (Keep in mind I am *NOT* recommending that you do this. This information is provided for educational / experimental purposes only.) As for Matthew's solution, the only reason it worked at all was because you removed and re-added the host from the cluster. Had you not done that, VDSM would have overwritten your changes on the next host upgrade / reinstall, and as you have seen that solution won't completely fix a host in a hyperconverged configuration. As to the question about oVirt's Logical Networks, what I meant was that oVirt doesn't care what the IP configuration is for them, and that if you wanted to change which network the roles used you needed to do so elsewhere in the web interface. The only thing that does matter for each role is that all of the clients using or hosts providing that role can communicate with each other on that interface. (I.e. If you use "Network Bob" for storage and migration, then all hosts with a "Network Bob" interface must be able to communicate with each other over that interface. If you use "Network Alice" for VM consoles, then all end- user workstations must be able to commuicate with the "Network Alice" interface. The exact IPs, vlan IDs, routing tables, and firewall restrictions for a logical network don't matter as long as each role can still reach the role on other hosts over the assigned interface.) -Patrick Hibbs On Sun, 2022-02-20 at 01:17 +, Abe E wrote: > So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as > my engines hostname without a hitch I had an issue with 1 node and > somehow I did something that broke its gluster and it wouldnt > activate, > So the gluster service wont start and after trying to open the node > from webgui to see what its showing in its virtualization tab I was > able to see that it allows me to run the hyperconverged wizard using > the existing config. Due to this i lost the engine because well the > 3rd node is just arbiter and node 2 complained about not having > shared storage. > > This node is the one which I built ovirt gluster from so i assumed it > would rebuild its gluster.. i accidentally clicked cleanup which got > rid of my gluster brick mounts :)) then I tried to halt it and > rebuild using existing configuration. Here is my issue though, am I > ab
[ovirt-users] Re: Broke my GlusterFS somehow
In lrder to have an idea how to help you provide the following from all nodes (separate the info per node): ip a sgluster pool listgluster peer statusgluster volume listgluster volume status allgluster volume all Best Regards,Strahil Nikolov On Sun, Feb 20, 2022 at 7:14, Patrick Hibbs wrote: OK, where to begin. As for your Gluster issue, Gluster maintains it's own copy of the configuration for each brick outside of oVirt / VDSM. As you have changed the network config manually, you also needed to change the Gluster config to match as well. The fact that you haven't is the reason why Gluster failed to restart the volume. However, In a hyperconverged configuration, oVirt maintains the gluster configuration in it's database. Manually fixing Gluster's configuration on the bricks themselves won't fix the engine's copy. (Believe me, I had to fix this before myself because I didn't use hostnames initially for the bricks. It's a pain to manually fix the database.) That copy is used to connect the VM's to their storage. If the engine's copy doesn't match Gluster's config, you'll have a working Gluster volume but the hosts won't be able to start VMs. Essentially, in a hyperconverged configuration oVirt doesn't allow removal of host with a Gluster brick unless removal of that host won't break Gluster and prevent the volume from running. (I.e. you can't remove a host if doing so would cause the volume to loose quorum.) Your options for fixing Gluster are either: 1. Add enough new bricks to the Gluster volumes so that removal of an old host (brick) doesn't cause quorum loss. - OR - 2. Manually update the engine's database with the engine and all hosts offline to point to the correct hosts, after manually updating the bricks and bringing back up the volume. The first option is your safest bet. But that assumes that the volume is up and can accept new bricks in the first place. If not, you could potentially still do the first option but it would require reverting your network configuration changes on each host first. The second option is one of last resort. This is the reason why I said updating the interfaces manually instead of using the web interface was a bad idea. If possible, use the first option. If not, you'd be better off just hosing the oVirt installation and reinstalling from scratch. If you *really* need to use the second option, you'll need to follow these instructions on each brick: https://serverfault.com/questions/631365/rename-a-glusterfs-peer and then update the engine database manually to point to the correct hostnames for each brick. (Keep in mind I am *NOT* recommending that you do this. This information is provided for educational / experimental purposes only.) As for Matthew's solution, the only reason it worked at all was because you removed and re-added the host from the cluster. Had you not done that, VDSM would have overwritten your changes on the next host upgrade / reinstall, and as you have seen that solution won't completely fix a host in a hyperconverged configuration. As to the question about oVirt's Logical Networks, what I meant was that oVirt doesn't care what the IP configuration is for them, and that if you wanted to change which network the roles used you needed to do so elsewhere in the web interface. The only thing that does matter for each role is that all of the clients using or hosts providing that role can communicate with each other on that interface. (I.e. If you use "Network Bob" for storage and migration, then all hosts with a "Network Bob" interface must be able to communicate with each other over that interface. If you use "Network Alice" for VM consoles, then all end- user workstations must be able to commuicate with the "Network Alice" interface. The exact IPs, vlan IDs, routing tables, and firewall restrictions for a logical network don't matter as long as each role can still reach the role on other hosts over the assigned interface.) -Patrick Hibbs On Sun, 2022-02-20 at 01:17 +, Abe E wrote: > So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as > my engines hostname without a hitch I had an issue with 1 node and > somehow I did something that broke its gluster and it wouldnt > activate, > So the gluster service wont start and after trying to open the node > from webgui to see what its showing in its virtualization tab I was > able to see that it allows me to run the hyperconverged wizard using > the existing config. Due to this i lost the engine because well the > 3rd node is just arbiter and node 2 complained about not having > shared storage. > > This node is the one which I built ovirt gluster from so i assumed it > would rebuild its gluster.. i accidentally clicked cleanup which got > rid of my gluster brick mounts :)) then I tried to halt it and > rebuild using existing configuration. Here is my issue though, am I > able to rebuild my node? > > This is a new lab system so I believe i have all my vms still
[ovirt-users] Re: Broke my GlusterFS somehow
OK, where to begin. As for your Gluster issue, Gluster maintains it's own copy of the configuration for each brick outside of oVirt / VDSM. As you have changed the network config manually, you also needed to change the Gluster config to match as well. The fact that you haven't is the reason why Gluster failed to restart the volume. However, In a hyperconverged configuration, oVirt maintains the gluster configuration in it's database. Manually fixing Gluster's configuration on the bricks themselves won't fix the engine's copy. (Believe me, I had to fix this before myself because I didn't use hostnames initially for the bricks. It's a pain to manually fix the database.) That copy is used to connect the VM's to their storage. If the engine's copy doesn't match Gluster's config, you'll have a working Gluster volume but the hosts won't be able to start VMs. Essentially, in a hyperconverged configuration oVirt doesn't allow removal of host with a Gluster brick unless removal of that host won't break Gluster and prevent the volume from running. (I.e. you can't remove a host if doing so would cause the volume to loose quorum.) Your options for fixing Gluster are either: 1. Add enough new bricks to the Gluster volumes so that removal of an old host (brick) doesn't cause quorum loss. - OR - 2. Manually update the engine's database with the engine and all hosts offline to point to the correct hosts, after manually updating the bricks and bringing back up the volume. The first option is your safest bet. But that assumes that the volume is up and can accept new bricks in the first place. If not, you could potentially still do the first option but it would require reverting your network configuration changes on each host first. The second option is one of last resort. This is the reason why I said updating the interfaces manually instead of using the web interface was a bad idea. If possible, use the first option. If not, you'd be better off just hosing the oVirt installation and reinstalling from scratch. If you *really* need to use the second option, you'll need to follow these instructions on each brick: https://serverfault.com/questions/631365/rename-a-glusterfs-peer and then update the engine database manually to point to the correct hostnames for each brick. (Keep in mind I am *NOT* recommending that you do this. This information is provided for educational / experimental purposes only.) As for Matthew's solution, the only reason it worked at all was because you removed and re-added the host from the cluster. Had you not done that, VDSM would have overwritten your changes on the next host upgrade / reinstall, and as you have seen that solution won't completely fix a host in a hyperconverged configuration. As to the question about oVirt's Logical Networks, what I meant was that oVirt doesn't care what the IP configuration is for them, and that if you wanted to change which network the roles used you needed to do so elsewhere in the web interface. The only thing that does matter for each role is that all of the clients using or hosts providing that role can communicate with each other on that interface. (I.e. If you use "Network Bob" for storage and migration, then all hosts with a "Network Bob" interface must be able to communicate with each other over that interface. If you use "Network Alice" for VM consoles, then all end- user workstations must be able to commuicate with the "Network Alice" interface. The exact IPs, vlan IDs, routing tables, and firewall restrictions for a logical network don't matter as long as each role can still reach the role on other hosts over the assigned interface.) -Patrick Hibbs On Sun, 2022-02-20 at 01:17 +, Abe E wrote: > So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as > my engines hostname without a hitch I had an issue with 1 node and > somehow I did something that broke its gluster and it wouldnt > activate, > So the gluster service wont start and after trying to open the node > from webgui to see what its showing in its virtualization tab I was > able to see that it allows me to run the hyperconverged wizard using > the existing config. Due to this i lost the engine because well the > 3rd node is just arbiter and node 2 complained about not having > shared storage. > > This node is the one which I built ovirt gluster from so i assumed it > would rebuild its gluster.. i accidentally clicked cleanup which got > rid of my gluster brick mounts :)) then I tried to halt it and > rebuild using existing configuration. Here is my issue though, am I > able to rebuild my node? > > This is a new lab system so I believe i have all my vms still on my > external HDDs. If I can restore this 1 node and have it rejoin the > gluster then great, otherwise whats the best route using the webgui > (I am remote at the moment) to just wipe all 3 nodes and start all > over again and work it slowly? Is it simply deleting the partitions > for