[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-22 Thread Thomas Hoberg
I'm glad you made it work!

My main lesson from oVirt from the last two years is: It's not a turnkey 
solution.

Unless you are willing to dive deep and understand how it works (not so easy, 
because there is few up-to-date materials to explain the concepts) *AND* spend 
a significant amount of time to run it through its paces, you may not survive 
the first thing that goes wrong.

And if you do, that may be even worse: Trying to fix a busy production farm 
that has a core problem is stuff for nightmares.

So make sure you have a test environment or two to try things.

And it doesn't have to be physical, especially for the more complex things, 
that require a lot of restart-from-scratch before you do the thing that breaks 
things.

You can run oVirt on oVirt or any other hypervisor that supports nested 
virtualization and play with that before you deal with a "real patient".

I've gone through through catastrophic hardware failures, which would have 
resulted in in a total loss without the resilience oVirt HCI with Gluster 
replicas provided, but I've also lost a lot of hair and nails just running it.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SDBFJNIAMKEH4BSVI6KEHZQKXQX2L4KV/


[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-21 Thread Strahil Nikolov via Users
Rule number 1 with gluster is 'always check on the 
cli'...Glustereventsd.service sends even notifications to oVirt but if it 
doesn't work the engine won't identify any issues.
By the way check if glusterventsd is working fine.
Best Regards,Strahil Nikolov
 
 
  On Mon, Feb 21, 2022 at 21:32, Abe E wrote:   Thats what 
I did, rebuilt the Ovirt. Im learning alot so I dont mind but I want to make 
sure its built and designed well by me before I ever let anyone get hands on it 
from my colleagues as I inherited an old system that was never maintained and 
its been a lot to figure out how they designed it with no documentation so its 
better to start clean.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VEFJ57EXQU5ZNF2WWHU25IK2ASFWQQLK/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YSFV33FW2OZVM7D3T2WEN65HPUTDNJ3Q/


[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-21 Thread Abe E
Thats what I did, rebuilt the Ovirt. Im learning alot so I dont mind but I want 
to make sure its built and designed well by me before I ever let anyone get 
hands on it from my colleagues as I inherited an old system that was never 
maintained and its been a lot to figure out how they designed it with no 
documentation so its better to start clean.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VEFJ57EXQU5ZNF2WWHU25IK2ASFWQQLK/


[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-21 Thread Thomas Hoberg
I think Patrick already gave quite sound advice.

I'd only want to add, that you should strictly separate dealing with Gluster 
and oVirt: the integration isn't strong and oVirt just uses Gluster and won't 
try to fix it intelligently.

Changing hostnames on an existing Gluster is "not supported" I guess, even if I 
understand how that can be a need in real life.

I've had occasion to move HCI clusters from one IP range to another and it's a 
bit like open heart surgery.

I had no control over the DNS in that environment so I made do with /etc/hosts 
on all Gluster members. It's stone age, but it works and you have full control.

So you are basically free to use the old hostnames in /etc/hosts to ensure that 
the gluster pool members are able to talk to each other, while all outside 
access is with the new names. If you can get the management engine to run 
somehow, you can use the trick of using the old aliases in the host file, too, 
to regain operations.

In some cases I've even worked with pure IP addresses for the Gluster setup and 
even that can be all changed in the /var/lib configuration files if and only if 
all Gluster daemons are shut down (they tend to keep their state in memory and 
write it down as the daemon finishes).

Once Gluster itself is happy and "gluster volume status all" is showing "y" on 
all ports and bricks, oVirt generally had no issues at all using the storage. 
It may just take a long time to show things as ok.

The only other piece of advice I can give for a situation like that is to 
decide where your value is and how quickly you need to be back in business.

If it's the VMs running on the infra or the oVirt setup itself?

If it's the VMs and if you're still running on one Gluster leg, I'd concentrate 
on saving the VMs. 

Backup and Export domains on NFS are safe in the sense, that they can typically 
be attached to an oVirt that you rebuilt from scratch, so that's one option. 
OVA exports sometimes work and I've also used Clonezilla to copy VMs across to 
other hypervisors, booting the Clonezilla ISO at both ends and doing a network 
transfer: they lose some attributs and the network may need to be redone 
afterwards, but the data stays safe.

If it's the oVirt setup, I'd rather recommend starting from scratch with the 
latest release and hopefully some backup of the VMs. Fiddling with the database 
is nothing I'd recommend.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KP6YTX2U5OUONAILDIACCA3XMNC7B2YH/


[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-20 Thread Strahil Nikolov via Users
* gluster volume info all
 
 
  On Sun, Feb 20, 2022 at 14:46, Strahil Nikolov wrote:  
 In lrder to have an idea how to help you provide the following from all nodes 
(separate the info per node):
ip a sgluster pool listgluster peer statusgluster volume listgluster volume 
status allgluster volume all
Best Regards,Strahil Nikolov 
 
  On Sun, Feb 20, 2022 at 7:14, Patrick Hibbs wrote:   
OK, where to begin.

As for your Gluster issue, Gluster maintains it's own copy of the
configuration for each brick outside of oVirt / VDSM. As you have
changed the network config manually, you also needed to change the
Gluster config to match as well. The fact that you haven't is the
reason why Gluster failed to restart the volume.

However, In a hyperconverged configuration, oVirt maintains the gluster
configuration in it's database. Manually fixing Gluster's configuration
on the bricks themselves won't fix the engine's copy. (Believe me, I
had to fix this before myself because I didn't use hostnames initially
for the bricks. It's a pain to manually fix the database.) That copy is
used to connect the VM's to their storage. If the engine's copy doesn't
match Gluster's config, you'll have a working Gluster volume but the
hosts won't be able to start VMs.

Essentially, in a hyperconverged configuration oVirt doesn't allow
removal of host with a Gluster brick unless removal of that host won't
break Gluster and prevent the volume from running. (I.e. you can't
remove a host if doing so would cause the volume to loose quorum.)

Your options for fixing Gluster are either:
    1. Add enough new bricks to the Gluster volumes so that
removal of an old host (brick) doesn't cause quorum loss.

    - OR -

    2. Manually update the engine's database with the engine and
all hosts offline to point to the correct hosts, after manually
updating the bricks and bringing back up the volume.

The first option is your safest bet. But that assumes that the volume
is up and can accept new bricks in the first place. If not, you could
potentially still do the first option but it would require reverting
your network configuration changes on each host first.

The second option is one of last resort. This is the reason why I said
updating the interfaces manually instead of using the web interface was
a bad idea. If possible, use the first option. If not, you'd be better
off just hosing the oVirt installation and reinstalling from scratch.

If you *really* need to use the second option, you'll need to follow
these instructions on each brick:
https://serverfault.com/questions/631365/rename-a-glusterfs-peer

and then update the engine database manually to point to the correct
hostnames for each brick. (Keep in mind I am *NOT* recommending that
you do this. This information is provided for educational /
experimental purposes only.)

As for Matthew's solution, the only reason it worked at all was because
you removed and re-added the host from the cluster. Had you not done
that, VDSM would have overwritten your changes on the next host upgrade
/ reinstall, and as you have seen that solution won't completely fix a
host in a hyperconverged configuration.

As to the question about oVirt's Logical Networks, what I meant was
that oVirt doesn't care what the IP configuration is for them, and that
if you wanted to change which network the roles used you needed to do
so elsewhere in the web interface. The only thing that does matter for
each role is that all of the clients using or hosts providing that role
can communicate with each other on that interface. (I.e. If you use
"Network Bob" for storage and migration, then all hosts with a "Network
Bob" interface must be able to communicate with each other over that
interface. If you use "Network Alice" for VM consoles, then all end-
user workstations must be able to commuicate with the "Network Alice"
interface. The exact IPs, vlan IDs, routing tables, and firewall
restrictions for a logical network don't matter as long as each role
can still reach the role on other hosts over the assigned interface.)

-Patrick Hibbs

On Sun, 2022-02-20 at 01:17 +, Abe E wrote:
> So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as
> my engines hostname without a hitch I had an issue with 1 node and
> somehow I did something that broke its gluster and it wouldnt
> activate,
> So the gluster service wont start and after trying to open the node
> from webgui to see what its showing in its virtualization tab I was
> able to see that it allows me to run the hyperconverged wizard using
> the existing config. Due to this i lost the engine because well the
> 3rd node is just arbiter and node 2 complained about not having
> shared storage.
> 
> This node is the one which I built ovirt gluster from so i assumed it
> would rebuild its gluster.. i accidentally clicked cleanup which got
> rid of my gluster brick mounts :)) then I tried to halt it and
> rebuild using existing configuration. Here is my issue though, am I
> ab

[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-20 Thread Strahil Nikolov via Users
In lrder to have an idea how to help you provide the following from all nodes 
(separate the info per node):
ip a sgluster pool listgluster peer statusgluster volume listgluster volume 
status allgluster volume all
Best Regards,Strahil Nikolov 
 
  On Sun, Feb 20, 2022 at 7:14, Patrick Hibbs wrote:   
OK, where to begin.

As for your Gluster issue, Gluster maintains it's own copy of the
configuration for each brick outside of oVirt / VDSM. As you have
changed the network config manually, you also needed to change the
Gluster config to match as well. The fact that you haven't is the
reason why Gluster failed to restart the volume.

However, In a hyperconverged configuration, oVirt maintains the gluster
configuration in it's database. Manually fixing Gluster's configuration
on the bricks themselves won't fix the engine's copy. (Believe me, I
had to fix this before myself because I didn't use hostnames initially
for the bricks. It's a pain to manually fix the database.) That copy is
used to connect the VM's to their storage. If the engine's copy doesn't
match Gluster's config, you'll have a working Gluster volume but the
hosts won't be able to start VMs.

Essentially, in a hyperconverged configuration oVirt doesn't allow
removal of host with a Gluster brick unless removal of that host won't
break Gluster and prevent the volume from running. (I.e. you can't
remove a host if doing so would cause the volume to loose quorum.)

Your options for fixing Gluster are either:
    1. Add enough new bricks to the Gluster volumes so that
removal of an old host (brick) doesn't cause quorum loss.

    - OR -

    2. Manually update the engine's database with the engine and
all hosts offline to point to the correct hosts, after manually
updating the bricks and bringing back up the volume.

The first option is your safest bet. But that assumes that the volume
is up and can accept new bricks in the first place. If not, you could
potentially still do the first option but it would require reverting
your network configuration changes on each host first.

The second option is one of last resort. This is the reason why I said
updating the interfaces manually instead of using the web interface was
a bad idea. If possible, use the first option. If not, you'd be better
off just hosing the oVirt installation and reinstalling from scratch.

If you *really* need to use the second option, you'll need to follow
these instructions on each brick:
https://serverfault.com/questions/631365/rename-a-glusterfs-peer

and then update the engine database manually to point to the correct
hostnames for each brick. (Keep in mind I am *NOT* recommending that
you do this. This information is provided for educational /
experimental purposes only.)

As for Matthew's solution, the only reason it worked at all was because
you removed and re-added the host from the cluster. Had you not done
that, VDSM would have overwritten your changes on the next host upgrade
/ reinstall, and as you have seen that solution won't completely fix a
host in a hyperconverged configuration.

As to the question about oVirt's Logical Networks, what I meant was
that oVirt doesn't care what the IP configuration is for them, and that
if you wanted to change which network the roles used you needed to do
so elsewhere in the web interface. The only thing that does matter for
each role is that all of the clients using or hosts providing that role
can communicate with each other on that interface. (I.e. If you use
"Network Bob" for storage and migration, then all hosts with a "Network
Bob" interface must be able to communicate with each other over that
interface. If you use "Network Alice" for VM consoles, then all end-
user workstations must be able to commuicate with the "Network Alice"
interface. The exact IPs, vlan IDs, routing tables, and firewall
restrictions for a logical network don't matter as long as each role
can still reach the role on other hosts over the assigned interface.)

-Patrick Hibbs

On Sun, 2022-02-20 at 01:17 +, Abe E wrote:
> So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as
> my engines hostname without a hitch I had an issue with 1 node and
> somehow I did something that broke its gluster and it wouldnt
> activate,
> So the gluster service wont start and after trying to open the node
> from webgui to see what its showing in its virtualization tab I was
> able to see that it allows me to run the hyperconverged wizard using
> the existing config. Due to this i lost the engine because well the
> 3rd node is just arbiter and node 2 complained about not having
> shared storage.
> 
> This node is the one which I built ovirt gluster from so i assumed it
> would rebuild its gluster.. i accidentally clicked cleanup which got
> rid of my gluster brick mounts :)) then I tried to halt it and
> rebuild using existing configuration. Here is my issue though, am I
> able to rebuild my node?
> 
> This is a new lab system so I believe i have all my vms still

[ovirt-users] Re: Broke my GlusterFS somehow

2022-02-19 Thread Patrick Hibbs
OK, where to begin.

As for your Gluster issue, Gluster maintains it's own copy of the
configuration for each brick outside of oVirt / VDSM. As you have
changed the network config manually, you also needed to change the
Gluster config to match as well. The fact that you haven't is the
reason why Gluster failed to restart the volume.

However, In a hyperconverged configuration, oVirt maintains the gluster
configuration in it's database. Manually fixing Gluster's configuration
on the bricks themselves won't fix the engine's copy. (Believe me, I
had to fix this before myself because I didn't use hostnames initially
for the bricks. It's a pain to manually fix the database.) That copy is
used to connect the VM's to their storage. If the engine's copy doesn't
match Gluster's config, you'll have a working Gluster volume but the
hosts won't be able to start VMs.

Essentially, in a hyperconverged configuration oVirt doesn't allow
removal of host with a Gluster brick unless removal of that host won't
break Gluster and prevent the volume from running. (I.e. you can't
remove a host if doing so would cause the volume to loose quorum.)

Your options for fixing Gluster are either:
1. Add enough new bricks to the Gluster volumes so that
removal of an old host (brick) doesn't cause quorum loss.

- OR -

2. Manually update the engine's database with the engine and
all hosts offline to point to the correct hosts, after manually
updating the bricks and bringing back up the volume.

The first option is your safest bet. But that assumes that the volume
is up and can accept new bricks in the first place. If not, you could
potentially still do the first option but it would require reverting
your network configuration changes on each host first.

The second option is one of last resort. This is the reason why I said
updating the interfaces manually instead of using the web interface was
a bad idea. If possible, use the first option. If not, you'd be better
off just hosing the oVirt installation and reinstalling from scratch.

If you *really* need to use the second option, you'll need to follow
these instructions on each brick:
https://serverfault.com/questions/631365/rename-a-glusterfs-peer

and then update the engine database manually to point to the correct
hostnames for each brick. (Keep in mind I am *NOT* recommending that
you do this. This information is provided for educational /
experimental purposes only.)

As for Matthew's solution, the only reason it worked at all was because
you removed and re-added the host from the cluster. Had you not done
that, VDSM would have overwritten your changes on the next host upgrade
/ reinstall, and as you have seen that solution won't completely fix a
host in a hyperconverged configuration.

As to the question about oVirt's Logical Networks, what I meant was
that oVirt doesn't care what the IP configuration is for them, and that
if you wanted to change which network the roles used you needed to do
so elsewhere in the web interface. The only thing that does matter for
each role is that all of the clients using or hosts providing that role
can communicate with each other on that interface. (I.e. If you use
"Network Bob" for storage and migration, then all hosts with a "Network
Bob" interface must be able to communicate with each other over that
interface. If you use "Network Alice" for VM consoles, then all end-
user workstations must be able to commuicate with the "Network Alice"
interface. The exact IPs, vlan IDs, routing tables, and firewall
restrictions for a logical network don't matter as long as each role
can still reach the role on other hosts over the assigned interface.)

-Patrick Hibbs

On Sun, 2022-02-20 at 01:17 +, Abe E wrote:
> So upon changing my ovirt nodes (3Hyperconverged Gluster) as well as
> my engines hostname without a hitch I had an issue with 1 node and
> somehow I did something that broke its gluster and it wouldnt
> activate,
> So the gluster service wont start and after trying to open the node
> from webgui to see what its showing in its virtualization tab I was
> able to see that it allows me to run the hyperconverged wizard using
> the existing config. Due to this i lost the engine because well the
> 3rd node is just arbiter and node 2 complained about not having
> shared storage.
> 
> This node is the one which I built ovirt gluster from so i assumed it
> would rebuild its gluster.. i accidentally clicked cleanup which got
> rid of my gluster brick mounts :)) then I tried to halt it and
> rebuild using existing configuration. Here is my issue though, am I
> able to rebuild my node?
> 
> This is a new lab system so I believe i have all my vms still on my
> external HDDs. If I can restore this 1 node and have it rejoin the
> gluster then great, otherwise whats the best route using the webgui
> (I am remote at the moment) to just wipe all 3 nodes and start all
> over again and work it slowly? Is it simply deleting the partitions
> for