That very much describes my own situation two years ago..., just a slight time 
and geographic offset as my home is near Frankfurt and my work is in Lyon. I 
had been doing 70:1 consolidation via virtualization based on OpenVZ 
(containers, but with a IaaS abstraction), since 2006 because it was zero 
hardware and software budget whereas VMware would have both requried VT-x 
capable hardware and pricey licenses.

OpenVZ turned out extremely easy to use, super reliable and the sysadmin only 
had to learn two or three new commands, not a new abstraction layer. We were 
running payment front office systems, where five minutes of downtime mean a 
very angry boss, one hour of down time costs a yearly bonus and beyond that you 
have to gather your things and go.

That also meant planned 100% availability of eventually consistent data, so 
nothing you could do on a file level: Oracle with streams replication and 
programmed healing or nothing, that meant at the time, can't beat the CAP 
theorem.

(No this isn't a recommendation to go with OpenVZ: That project is about as 
lively as Docker these days).

For a new context with far lighter avaiIability demands but the need to support 
GPUs for machine learning (CUDA breaks OpenVZ containers) I thought that oVirt 
would add managed heterogeneous VMs and hyperconverged storage to the mix, 
again at zero entrance fee with a supported option if that became necessary.

Two years later, I'd say that "I married the wrong woman". It works, but the 
claim "oVirt is an open-source distributed virtualization solution, designed to 
manage your entire enterprise infrastructure" is terribly misleading.

I had planned to get it reasonably stable and ready within weeks of my typical 
stress and failure testing on 1/3 time budget, but it took almost 2 years and 
more like 50% time allocation to learn all the very many things that can go 
wrong, how to diagnose them and how to fix them.

Just yesterday I had one cluster (my 1st functional test 3 node HCI running on 
Atoms, that mostly just sits idle and gets updates, which require reboots), had 
evidently decided to lose one gluster network connection and accumulated 5000+ 
entries in the heal queue during a week of vacation.

It was four hours of careful digging, hundreds of restarted daemons, various 
reboots and a transient situation where the three storage nodes could not see 
or access the storage they were providing, while the management engine ran on a 
compute node and continued to write to a disk that evidently did not exist...  
As a newbie I would have either jumped off a cliff (active users) or tossed the 
project.

However, the magnificent basic design of oVirt had me recover everything 
without a loss... except hair, nerves, general health etc.

And I had to learn the hard way, that just exporting a VM as an OVA and 
expecting it to be importable on any other platform advertising OVA support, or 
even back into oVirt, is not functionality ever included in any regular QA 
testing...

...which might explain why it doesn't work. Or perhaps just no longer, after an 
update.

In short: do not expect anything to work, that you have not fully tested after 
every minor update--several times: everything is extremely raw and ready to 
break at any moment and I can't remember the last time I did a plain vanilla 
install on freshly scrubbed hardware, where I didn't have to help it along 
manually and with digging through dozens of big log files.

What really troubles me: since the basic ingredients for the commercial product 
are the same, I don't see how that might save your bacon. Perhaps 100% 
validated hardware might do it (for a while), but where oVirt is *designed* for 
a maximum of flexibility, it won't reward your taking advantage of that.

I am sticking with it until CentOS7 is end of life, too (just like CentOS8 is 
already), because otherwise I'd have nothing to show for two years of work. But 
if you want to join in, you need to have serious resources to commit. It's most 
likely still smaller than OpenStack, though.

And if you have NetApp filers or SAN, you should not risk HCI. That is super 
elegant as a concept, just like Gluster is a beautiful concept, but very soon 
you'll realize that they were never designed for each other and remain full of 
contradictions.

oVirt may be designed to fit that enterprise role, but in the HCI variant, it 
stil has nowhere near the cohesion and maturity you'd need for that role. 
CentOS, LVM, VDO, KVM, the management engine, Gluster, Ansible are all distinct 
products from what used to be different companies.

And it shows.

Of course, that's just my personal experience and opinion.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NRZPNUZTJGVJ3XRVNCFW6QS2DUBUY4QM/

Reply via email to