[ovirt-users] VXLAN
Hi, currently we're evaluation the usage of VXLAN in oVirt - and strangely enough we don't find anything in the manual. Has this been implemented, yet? Or are we just blind because it is hidden somewhere in the tons of features of /ovn/? Kind regards, Daniel -- Daniel Menzel Geschäftsführer Menzel IT GmbH Charlottenburger Str. 33a 13086 Berlin +49 (0) 30 / 5130 444 - 00 daniel.men...@menzel-it.net https://menzel-it.net Geschäftsführer: Daniel Menzel, Josefin Menzel Unternehmenssitz: Berlin Handelsregister: Amtsgericht Charlottenburg Handelsregister-Nummer: HRB 149835 B USt-ID: DE 309 226 751 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z4AG42UIC6XPOIEIENF36TKBV4ZDWB67/
[ovirt-users] illegal disk status
Hi, we have a problem with some VMs which cannot be started anymore due to an illegal disk status of a snapshot. What happend (most likely)? we tried to snapshot those vms some days ago but the storage domain didn't have enough free space left. Yesterday we shut those vms down - and from then on they didn't start anymore. What have I tried so far? 1. Via the web interface I tried to remove the snapshot - didn't work. 2. Searched the internet. Found (among other stuff) this: https://bugzilla.redhat.com/show_bug.cgi?id=1649129 3. via /vdsm-tool dump-volume-chains/ I managed to list those 5 snapshots (see below). The output for one machine was: image: 2d707743-4a9e-40bb-b223-83e3be672dfe - 9ae6ea73-94b4-4588-9a6b-ea7a58ef93c9 status: OK, voltype: INTERNAL, format: RAW, legality: LEGAL, type: PREALLOCATED, capacity: 32212254720, truesize: 32212254720 - f7d2c014-e8f5-4413-bfc5-4aa1426cb1e2 status: ILLEGAL, voltype: LEAF, format: COW, legality: ILLEGAL, type: SPARSE, capacity: 32212254720, truesize: 29073408 So my idea was to follow the said bugzilla thread and update the volume - but I didn't manage to find input for the /job_id/ and /generation/. So my question is: Does anyone have an idea on how to (force) remove a given snapshot via vsdm-{tool|client}? Thanks in advance! Daniel -- Daniel Menzel Geschäftsführer Menzel IT GmbH Charlottenburger Str. 33a 13086 Berlin +49 (0) 30 / 5130 444 - 00 daniel.men...@menzel-it.net https://menzel-it.net Geschäftsführer: Daniel Menzel, Josefin Menzel Unternehmenssitz: Berlin Handelsregister: Amtsgericht Charlottenburg Handelsregister-Nummer: HRB 149835 B USt-ID: DE 309 226 751 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5ZRHPBH6PKWUXSQIEKT4352D5RVNH6G6/
[ovirt-users] Re: NVMe-oF
Wenn ich mit meinen begrenzten Fähigkeiten, zu programmieren, helfen könnte würde ich das sofort machen. Wozu also der Seitenhieb? Am 27.06.20 um 23:20 schrieb tho...@hoberg.net: > Irgendwie habe ich das Gefühl, daß Dir da eine schlüsselfertige Lösung > vorschwebt, die Du direkt verticken kannst... -- Daniel Menzel Geschäftsführer Menzel IT GmbH Charlottenburger Str. 33a 13086 Berlin +49 (0) 30 / 5130 444 - 00 daniel.men...@menzel-it.net https://menzel-it.net Geschäftsführer: Daniel Menzel, Josefin Menzel Unternehmenssitz: Berlin Handelsregister: Amtsgericht Charlottenburg Handelsregister-Nummer: HRB 149835 B USt-ID: DE 309 226 751 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MPZTPGGCZ5TEURWJIHLOEQD2ICJ6IK44/
[ovirt-users] NVMe-oF
Hi all, does anybody know whether NVMe-over-Fabrics support is somehow scheduled in oVirt? As, from a logical point of view, it is like iSCSI on steroids, I guess it shouldn't be too hard to implement it. Yep, I know, bold statement from someone who isn't a programmer. ;-) Nonetheless: Are there plans to implement NVMe-oF as storage backend for oVirt in the near future? If so: Is there a way to help (i.e. with hardware ressources)? Kind regards, Daniel -- Daniel Menzel Geschäftsführer Menzel IT GmbH Charlottenburger Str. 33a 13086 Berlin +49 (0) 30 / 5130 444 - 00 daniel.men...@menzel-it.net https://menzel-it.net Geschäftsführer: Daniel Menzel, Josefin Menzel Unternehmenssitz: Berlin Handelsregister: Amtsgericht Charlottenburg Handelsregister-Nummer: HRB 149835 B USt-ID: DE 309 226 751 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QQQDCQERXVOVPDEXOMAMG7H5H7PGSBDZ/
[ovirt-users] host failed to attach one of the storage domains attached to it
Dear all, after a GlusterFS reboot of two servers out of three (and thus a loss of quorum) by an admin we've got strange problems: 1. The hosted engine and one export work perfectly fine. (Same three servers!) 2. Another export (main one) seems to be fine within GlusterFS itself. But: When we activate this domain most of the hosts go into "non operational" with the subject's error message. As soon as we deactivate this domain all those hosts come back online. Strange thing: The GlusterFS export seems to be mounted on the SPM. Does anyone know what could have happened and how to fix that? Kind regards Daniel ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/E77R65MMJHTIGDSDBTJQCKSS64SIMXEE/
[ovirt-users] NVME-oF
Hi guys, just out of curiosity: Have there been any attempts yet to use NVMe over Fabrics as Storage-Backend in oVirt? We have used NVMe-oF several times in the past few weeks/month and its an absolutely gorgeous technology - and more than perfect for virtualization. Best Daniel -- Daniel Menzel Geschäftsführer Menzel IT GmbH Charlottenburger Str. 33a 13086 Berlin +49 (0) 30 / 5130 444 - 00 daniel.men...@menzel-it.net https://menzel-it.net Geschäftsführer: Daniel Menzel Unternehmenssitz: Berlin Handelsregister: Amtsgericht Charlottenburg Handelsregister-Nummer: HRB 149835 B USt-ID: DE 309 226 751 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GRIL6L4DESLORUPO4HQYJ42BA6ITFF5Q/
[ovirt-users] Re: Console to HostedEngine
Hi Simone, it worked - I can access the server via SSH again to solve the original problem (which is an httpd problem). Concerning the add-console-password problem: It sounds weird to me, too. For others: In my case the console device was there, it just did not have any ID or address: devices={device:console,type:console} Regards Daniel something adding like this should be enough: devices={device:console,type:console,deviceId:816e131e-5718-45e7-b1e3-81a6dfd51e19,address:None} and please remove xmlBase64= line if there since it contains the XML for libvirt as generated by the engine and that one, if there, wins over the dictionary for vdsm. > As an alternative, VNC console should work as well; you can set VNC > password with > hosted-engine --add-console-password I have tried this, too. The result was the following: Command VM.updateDevice with args {'params': {'existingConnAction': 'keep', 'graphicsType': 'vnc', 'params': {}, 'ttl': '120', 'deviceType': 'graphics', 'password': ''}, 'vmID': ''} failed: (code=56, message=Failed to update device) This sounds really weird. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5OULWWI6VIXY3F342RQIJDSMP65ZR5N6/
[ovirt-users] Re: Console to HostedEngine
Hello Simone, thanks for your reply. hosted-engine --vm-shutdown --vm-conf=/root/my_vm.conf I came across that before but the syntax of this file is nebulous to me as it looks like some kind of JSON?! How do I add the serial console there? What's the syntax? As an alternative, VNC console should work as well; you can set VNC password with hosted-engine --add-console-password I have tried this, too. The result was the following: Command VM.updateDevice with args {'params': {'existingConnAction': 'keep', 'graphicsType': 'vnc', 'params': {}, 'ttl': '120', 'deviceType': 'graphics', 'password': ''}, 'vmID': ''} failed: (code=56, message=Failed to update device) ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SLV4GDTQRWSWH4JRW7I23HRXVP6AK5B5/
[ovirt-users] Console to HostedEngine
Hi at all, we cannot access our hosted engine anymore. It started with and overfull /var due to a growing database. We access the engine via SSH and tried to fix that - but somehow we seem to have produced another problem on the SSH server itself. So unfortunately we can not login anymore. We then tried to access it via its host and a "hosted-engine --console" but ran into an internal error: cannot find character device which I know from KVM. With other VMs I could follow the RedHat's advice to add a console (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_host_configuration_and_guest_installation_guide/app_domain_console) but although I edited the hosted engine's profile those changes weren't applied - and after an engine restart even deleted again. That kind of makes sense to me, but limits my options. So my question is: Is there any idea how can I access the the console with my current limitations to fix the SSH server's problems and then hopefully fix everything? Regards Daniel ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KGOQLNNAHLOBEOZSBDJOBLIM4URGZSPS/
Re: [ovirt-users] Decrease downtime for HA
Hi Michal, in your last mail you wrote, that the values can be turned down - how can this be done? Best Daniel On 12.04.2018 20:29, Michal Skrivanek wrote: On 12 Apr 2018, at 13:13, Daniel Menzel <mailto:daniel.men...@hhi.fraunhofer.de>> wrote: Hi there, does anyone have an idea how to decrease a virtual machine's downtime? Best Daniel On 06.04.2018 13:34, Daniel Menzel wrote: Hi Michal, Hi Daniel, adding Martin to review fencing behavior (sorry for misspelling your name in my first mail). that’s not the reason I’m replying late!:-)) The settings for the VMs are the following (oVirt 4.2): 1. HA checkbox enabled of course 2. "Target Storage Domain for VM Lease" -> left empty if you need faster reactions then try to use VM Leases as well, it won’t make a difference in this case but will help in case of network issues. E.g. if you use iSCSI and the storage connection breaks while host connection still works it would restart the VM in about 80s; otherwise it would take >5 mins. 3. "Resume Behavior" -> AUTO_RESUME 4. Priority for Migration -> High 5. "Watchdog Model" -> No-Watchdog For testing we did not kill any VM but the host. So basically we simulated an instantaneous crash by manually turning the machine off via IPMI-Interface (not via operating system!) and ping the guest(s). What happens then? 1. 2-3 seconds after the we press the host's shutdown button we lose ping contact to the VM(s). 2. After another 20s oVirt changes the host's status to "connecting", the VM's status is set to a question mark. 3. After ~1:30 the host is flagged to "non responsive” that sounds about right. Now fencing action should have been initiated, if you can share the engine logs we can confirm that. IIRC we first try soft fencing - try to ssh to that host, that might take some time to time out I guess. Martin? 3. 4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the guest is back online. So, there seems to be one mistake I made in the first mail: The downtime is "only" 2.5min. But still I think this time can be decreased as for some services it is still quite a long time. these values can be tuned down, but then you may be more susceptible to fencing power cycling a host in case of shorter network outages. It may be ok…depending on your requirements. Best Daniel On 06.04.2018 12:49, Michal Skrivanek wrote: On 6 Apr 2018, at 12:45, Daniel Menzel wrote: Hi Michael, thanks for your mail. Sorry, I forgot to write that. Yes, we have power management and fencing enabled on all hosts. We also tested this and found out that it works perfectly. So this cannot be the reason I guess. Hi Daniel, ok, then it’s worth looking into details. Can you describe in more detail what happens? What exact settings you’re using for such VM? Are you killing the HE VM or other VMs or both? Would be good to narrow it down a bit and then review the exact flow Thanks, michal Daniel On 06.04.2018 11:11, Michal Skrivanek wrote: On 4 Apr 2018, at 15:36, Daniel Menzel wrote: Hello, we're successfully using a setup with 4 Nodes and a replicated Gluster for storage. The engine is self hosted. What we're dealing with at the moment is the high availability: If a node fails (for example simulated by a forced power loss) the engine comes back up online withing ~2min. But guests (having the HA option enabled) come back online only after a very long grace time of ~5min. As we have a reliable network (40 GbE) and reliable servers I think that the default grace times are way too high for us - is there any possibility to change those values? And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? Otherwise we have to resort to relatively long timeouts to make sure the host is really dead Thanks, michal Thanks in advance! Daniel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org <mailto:Users@ovirt.org> http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Decrease downtime for HA
Hi there, does anyone have an idea how to decrease a virtual machine's downtime? Best Daniel On 06.04.2018 13:34, Daniel Menzel wrote: Hi Michal, (sorry for misspelling your name in my first mail). The settings for the VMs are the following (oVirt 4.2): 1. HA checkbox enabled of course 2. "Target Storage Domain for VM Lease" -> left empty 3. "Resume Behavior" -> AUTO_RESUME 4. Priority for Migration -> High 5. "Watchdog Model" -> No-Watchdog For testing we did not kill any VM but the host. So basically we simulated an instantaneous crash by manually turning the machine off via IPMI-Interface (not via operating system!) and ping the guest(s). What happens then? 1. 2-3 seconds after the we press the host's shutdown button we lose ping contact to the VM(s). 2. After another 20s oVirt changes the host's status to "connecting", the VM's status is set to a question mark. 3. After ~1:30 the host is flagged to "non responsive" 4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the guest is back online. So, there seems to be one mistake I made in the first mail: The downtime is "only" 2.5min. But still I think this time can be decreased as for some services it is still quite a long time. Best Daniel On 06.04.2018 12:49, Michal Skrivanek wrote: On 6 Apr 2018, at 12:45, Daniel Menzel wrote: Hi Michael, thanks for your mail. Sorry, I forgot to write that. Yes, we have power management and fencing enabled on all hosts. We also tested this and found out that it works perfectly. So this cannot be the reason I guess. Hi Daniel, ok, then it’s worth looking into details. Can you describe in more detail what happens? What exact settings you’re using for such VM? Are you killing the HE VM or other VMs or both? Would be good to narrow it down a bit and then review the exact flow Thanks, michal Daniel On 06.04.2018 11:11, Michal Skrivanek wrote: On 4 Apr 2018, at 15:36, Daniel Menzel wrote: Hello, we're successfully using a setup with 4 Nodes and a replicated Gluster for storage. The engine is self hosted. What we're dealing with at the moment is the high availability: If a node fails (for example simulated by a forced power loss) the engine comes back up online withing ~2min. But guests (having the HA option enabled) come back online only after a very long grace time of ~5min. As we have a reliable network (40 GbE) and reliable servers I think that the default grace times are way too high for us - is there any possibility to change those values? And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? Otherwise we have to resort to relatively long timeouts to make sure the host is really dead Thanks, michal Thanks in advance! Daniel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Decrease downtime for HA
Hi Michal, (sorry for misspelling your name in my first mail). The settings for the VMs are the following (oVirt 4.2): 1. HA checkbox enabled of course 2. "Target Storage Domain for VM Lease" -> left empty 3. "Resume Behavior" -> AUTO_RESUME 4. Priority for Migration -> High 5. "Watchdog Model" -> No-Watchdog For testing we did not kill any VM but the host. So basically we simulated an instantaneous crash by manually turning the machine off via IPMI-Interface (not via operating system!) and ping the guest(s). What happens then? 1. 2-3 seconds after the we press the host's shutdown button we lose ping contact to the VM(s). 2. After another 20s oVirt changes the host's status to "connecting", the VM's status is set to a question mark. 3. After ~1:30 the host is flagged to "non responsive" 4. After ~2:10 the host's reboot is initiated by oVirt, 5-10s later the guest is back online. So, there seems to be one mistake I made in the first mail: The downtime is "only" 2.5min. But still I think this time can be decreased as for some services it is still quite a long time. Best Daniel On 06.04.2018 12:49, Michal Skrivanek wrote: On 6 Apr 2018, at 12:45, Daniel Menzel wrote: Hi Michael, thanks for your mail. Sorry, I forgot to write that. Yes, we have power management and fencing enabled on all hosts. We also tested this and found out that it works perfectly. So this cannot be the reason I guess. Hi Daniel, ok, then it’s worth looking into details. Can you describe in more detail what happens? What exact settings you’re using for such VM? Are you killing the HE VM or other VMs or both? Would be good to narrow it down a bit and then review the exact flow Thanks, michal Daniel On 06.04.2018 11:11, Michal Skrivanek wrote: On 4 Apr 2018, at 15:36, Daniel Menzel wrote: Hello, we're successfully using a setup with 4 Nodes and a replicated Gluster for storage. The engine is self hosted. What we're dealing with at the moment is the high availability: If a node fails (for example simulated by a forced power loss) the engine comes back up online withing ~2min. But guests (having the HA option enabled) come back online only after a very long grace time of ~5min. As we have a reliable network (40 GbE) and reliable servers I think that the default grace times are way too high for us - is there any possibility to change those values? And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? Otherwise we have to resort to relatively long timeouts to make sure the host is really dead Thanks, michal Thanks in advance! Daniel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Decrease downtime for HA
Hi Michael, thanks for your mail. Sorry, I forgot to write that. Yes, we have power management and fencing enabled on all hosts. We also tested this and found out that it works perfectly. So this cannot be the reason I guess. Daniel On 06.04.2018 11:11, Michal Skrivanek wrote: On 4 Apr 2018, at 15:36, Daniel Menzel wrote: Hello, we're successfully using a setup with 4 Nodes and a replicated Gluster for storage. The engine is self hosted. What we're dealing with at the moment is the high availability: If a node fails (for example simulated by a forced power loss) the engine comes back up online withing ~2min. But guests (having the HA option enabled) come back online only after a very long grace time of ~5min. As we have a reliable network (40 GbE) and reliable servers I think that the default grace times are way too high for us - is there any possibility to change those values? And do you have Power Management(iLO, iDRAC,etc) configured for your hosts? Otherwise we have to resort to relatively long timeouts to make sure the host is really dead Thanks, michal Thanks in advance! Daniel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Decrease downtime for HA
Hello, we're successfully using a setup with 4 Nodes and a replicated Gluster for storage. The engine is self hosted. What we're dealing with at the moment is the high availability: If a node fails (for example simulated by a forced power loss) the engine comes back up online withing ~2min. But guests (having the HA option enabled) come back online only after a very long grace time of ~5min. As we have a reliable network (40 GbE) and reliable servers I think that the default grace times are way too high for us - is there any possibility to change those values? Thanks in advance! Daniel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users