[ovirt-users] Re: 3node HCI fails when HostedEngineLocal is trying to add additional Gluster members

2019-12-04 Thread Strahil
Most probably the vdsm or supervdsm's PreExec task is doing it (they got 
multiple, so you can run manually till you find it out).
Just try the following:
systemctl stop vdsmd supervdsmd
systemctl start supervdsmd
Check for certs
systemctl start vdsmd


Keep in mind that that the chain of events (at least for me is):
1. VG activation
2. VDO activation
3. Gluster brick is mounted (I use systemd service due to deps between vdo, 
gluster brick and glusterd)
4. Glusterd and libvirt are started
5. Sanlock is started
6. Supervdsm
7. Vdsm
If this is a host that will host HostedEngine VM:
8. Ovirt-ha-broker
9. Ovirt-ha-agent


After cleanup, did you reboot?

Best Regards,
Strahil NikolovOn Dec 4, 2019 17:14, tho...@hoberg.net wrote:
>
> After spending another couple of hours trying to track down the problem, I 
> have found that the "lost connection" seems due to KVM shutting down, because 
> it cannot find the certificates for the Spice and VNC connections in 
> /etc/pki/vdsm/*, where 'ovirt-hosted-engine-cleanup' deleted them. 
>
> So now I wonder: Who is supposed to (re-)generated them afterwards? 
>
> Assuming that it was a much earlier step I proceeded to completely undo the 
> deployment, get rid of the Gluster setup etc. and start from the very 
> beginning, only to find that that didn't change a thing: It still missed 
> those certificates 
>
> ...while something or someone *did* generated them when I tried a distinct 
> and new set of nodes for counter-testing.. 
>
> That setup failed with an Ansible error (reported separately), but I have now 
> grown afraid of using 'ovirt-hosted-engine-cleanup' when I don't know how to 
> get the ciphers/keys for /etc/pki/vsdm/{spice|vnc} regenerated... 
>
> Can anyone shed some light into this darkness?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7AFFFU6KMDPSBT7TJFIQKBEQHX5RON5/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YXWE6QT3PSULP7RBI3BUXXVQ6E2YQUGE/


[ovirt-users] Re: Possible sources of cpu steal and countermeasures

2019-12-04 Thread Nir Soffer
On Wed, Dec 4, 2019 at 6:15 PM  wrote:
>
> Hi,
>
> I'm having performance issues with a ovirt installation. It is showing
> high steal (5-10%) for a cpu intensive VM. The hypervisor however has
> more than 65% of his resources idle while the steal is seen inside of
> the VM.
>
> Even when placing only a single VM on a hypervisor it still receives
> steal (0-2%), even though the hypervisor is not overcommited.
>
>
> Hypervisor:
>
> 2 Socket system in total 2*28(56HT) cores
>
>
> VM:
>
> 30vCPUs (ovirt seems to think its a good idea to make that 15 sockets *
> 2 cores)

I think you can control this in oVirt.

> My questions are:
>
> a) Could it be that the hypervisor is trying to schedule all 30 cores on
> a single numa node, ie using the HT cores instead of "real" ones and
> this shows up as steal?
>
> b) Do I need to make VMs this big numa-aware and spread the vm over both
> numa nodes?
>
> c) Would using the High Performance VM type help in this kind of situation?
>
> d) General advise: how do I reduce steal in an environment where the
> hypervisor has idle resources
>
>
> Any advise would be appreciated.

These questions are mainly about qemu, so adding qemu-discuss.

I think it will help if you share your vm qemu command line, found in:
/var/log/libvit/qemu/vm-name.log

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4RBWYLKGCXAGOSC7FM3UMPE5T3JHOQKV/


[ovirt-users] Re: Ovirt-OSN integration

2019-12-04 Thread siovelrm
Thanks Dominik, I'll try what you recommend
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UXEVOB6P5UUC2NREZCNJETXKA2S4NYPF/


[ovirt-users] Re: Ovirt-OSN integration

2019-12-04 Thread Dominik Holler
On Tue, Dec 3, 2019 at 4:41 PM  wrote:

> Hi, I want to use External Logical Networks Openstack Neutron. For this it
> is necessary to have all Openstack installed or is it possible to have only
> Neutron and used as a service?


Only Neutron is required.
If you want to create a lab install, the following command will do the job:

yum install -y centos-release-openstack-stein && \

yum install -y openstack-packstack && \

packstack --os-glance-install=y --os-cinder-install=y \

  --os-manila-install=n --os-nova-install=y --os-horizon-install=y \

  --os-swift-install=n --os-ceilometer-install=n --os-aodh-install=n \

  --os-panko-install=n --os-sahara-install=n  --os-heat-install=n \

  --os-magnum-install=n --os-trove-install=n --os-ironic-install=n \

  --os-client-install=y --os-neutron-install=y \

  --default-password=123456 \

  --provision-demo=y \

  --install-hosts=$controller_host \

  --os-network-hosts=$controller_host,$network_host0,$network_host1

# add $network_host0,$network_host1 to oVirt Engine

with $controller_host,$network_host0,$network_host1 are the hostnames of
clean CentOS7 hosts.


> Sorry if the question is very obvious but I am starting now with this
> topic.
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/EEF72RRIS7CZGJVF3B3QFAMA2WJ2VAJL/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTBYDUBBZ2ZSSLUDD7VHDRWQQROJZLJH/


[ovirt-users] Re: Unable to attach ISO domain to Datacenter

2019-12-04 Thread Ivan Apolonio
I've tried to commenting out this line on /etc/pam.d/password-auth:

authrequisite pam_succeed_if.so uid >= 1000 quiet_success

However, although it has stopped complaining about "requirement uid >= 1000 not 
met by user vdsm" on /var/log/secure , the error trying to attach ISO Domain to 
Datacenter still persists, showing the same error message, so it looks that 
this was not the cause of the problem :-(

Going back to stage zero.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TTD4PH2FIUK23427RSEWYQOGJSM7C7KB/


[ovirt-users] Re: 3node HCI fails when HostedEngineLocal is trying to add additional Gluster members

2019-12-04 Thread Strahil
If you manage to get your VM on the gluster - you are almost done.
I had similar situation and using virsh/hosted-engine , I have managed to reach 
the GUI.
From there we can have a Clue what is going on.
Usually there is a dependency:
- Master storage domain should be UP
-This allows the DC to become UP
-Then a host in the cluster can become SPM (control storage)
-And only then you can start your other gluster volumes.

So I can only recommend you to try to reach GUI and check the situation. Then 
it will be moderately easy to bring everything up.

Best Regards,
Strahil NikolovOn Dec 4, 2019 17:06, tho...@hoberg.net wrote:
>
> Thanks Strahil, for your suggestions. 
>
> Actually, I was far beyond the pick-up point you describe, as the Gluster had 
> all been prepared and was operable, even the local VM was already running and 
> accessible via the GUI. 
>
> But I picked up your hint to try to continue with the scripted variant, and 
> found that it allowed me much better insight into what was going on. 
>
> I am a little worried, though, that it actually works a little differently 
> from the GUI-wizard variant, in any case the failures don't seem identical. 
> That would have implications in terms of test-automation, which I'd rather 
> not have to worry about. 
>
> In a separate test for the same operation on a separate set of hardware the 
> installation got significantly further, up to the point where the local VM 
> had actually been moved onto the Gluster storage and into the Cluster, but 
> then failed a validation step at the very end (while the VM is actually up 
> and running, albeit only withe primary host and that listed as "Unresponsive" 
> while it's hosting the VM) 
>
> I have opened a separate tickt for that...
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/FTCFIP7Q5KWXTMEKWKGOEPEUCI5R5G2M/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TPKQFB4IUNJPY27MBE5K3ADQ66N3QF76/


[ovirt-users] Re: Unable to attach ISO domain to Datacenter

2019-12-04 Thread Ivan Apolonio
> On Tuesday, December 3, 2019, Ivan Apolonio  
> 
> This line shuts logging, worth to comment it out during check. Plus, do you
> have an #includedir setting in your /etc/sudoers file?
> 
> The vdsm.log snippet seems later than the error in the engine.log, could
> you provide one covering the failing attempt?
Hello, Amit.

It looks that commenting out that last line (Defaults:vdsm !syslog) did the 
trick to help identify  the problem. According to /var/log/secure log file, 
vdsm uid is being blocked to sudo due to pam requirements: 

Dec  4 10:53:36 Rosinha sudo: pam_unix(sudo:auth): authentication failure; 
logname=root uid=36 euid=0 tty=/dev/pts/0 ruser=vdsm rhost=  user=vdsm
Dec  4 10:53:36 Rosinha sudo: pam_succeed_if(sudo:auth): requirement "uid >= 
1000" not met by user "vdsm"
Dec  4 10:58:38 Rosinha sudo: pam_unix(sudo:auth): conversation failed
Dec  4 10:58:38 Rosinha sudo: pam_unix(sudo:auth): auth could not identify 
password for [vdsm]
Dec  4 10:58:38 Rosinha sudo: pam_succeed_if(sudo:auth): requirement "uid >= 
1000" not met by user "vdsm"

This "uid >= 1000" requirement is the CentOS 7 default. What is the best way to 
work around it? I'm asking that because if I just comment this rule on pam 
configuration files, it is going to allow other system users to sudo, which 
would lead to security issues.

Thanks,
Ivan
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/7NKVMVBQ5Z746JRV5U6UCVEW4SW2UFOS/


[ovirt-users] Possible sources of cpu steal and countermeasures

2019-12-04 Thread klaasdemter

Hi,

I'm having performance issues with a ovirt installation. It is showing 
high steal (5-10%) for a cpu intensive VM. The hypervisor however has 
more than 65% of his resources idle while the steal is seen inside of 
the VM.


Even when placing only a single VM on a hypervisor it still receives 
steal (0-2%), even though the hypervisor is not overcommited.



Hypervisor:

2 Socket system in total 2*28(56HT) cores


VM:

30vCPUs (ovirt seems to think its a good idea to make that 15 sockets * 
2 cores)



My questions are:

a) Could it be that the hypervisor is trying to schedule all 30 cores on 
a single numa node, ie using the HT cores instead of "real" ones and 
this shows up as steal?


b) Do I need to make VMs this big numa-aware and spread the vm over both 
numa nodes?


c) Would using the High Performance VM type help in this kind of situation?

d) General advise: how do I reduce steal in an environment where the 
hypervisor has idle resources



Any advise would be appreciated.


Greetings

Klaas

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/N5N4CBIBYLVT7APEBPCYMFBF57VSJRG7/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Hi Amit,



  is it inactive, but not in maintenance mode.



Thank you,

Oliver



Von: Amit Bawer 
Gesendet: Mittwoch, 4. Dezember 2019 16:36
An: Albl, Oliver 
Cc: users@ovirt.org; Nir Soffer 
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain



in check we had here, we got similar warnings for using the ignore OVF updates 
checks, but the SD was set inactive at end of process.

what is the SD status in your case after this try?





On Wed, Dec 4, 2019 at 4:49 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:

   Yes.

   Am 04.12.2019 um 15:47 schrieb Amit Bawer 
mailto:aba...@redhat.com>>>:



   On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>>>
 wrote:
   Hi Amit,

 unfortunately no success.

   Dec 4, 2019, 3:41:36 PM
   Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system 
because it's not visible by any of the hosts.

   Dec 4, 2019, 3:35:09 PM
   Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in 
Data Center Production.

   Dec 4, 2019, 3:35:09 PM
   Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data 
isn't updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).

   Have you selected the checkbox for "Ignore OVF update failure" before 
putting into maintenance?


   All the best,
   Oliver

   Von: Amit Bawer 
mailto:aba...@redhat.com>>>
   Gesendet: Mittwoch, 4. Dezember 2019 15:20
   An: Albl, Oliver 
mailto:oliver.a...@fabasoft.com>>>
   Cc: 
users@ovirt.org>;
 Nir Soffer 
mailto:nsof...@redhat.com>>>
   Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain

   Hi Oliver,

   For deactivating the unresponsive storage domains, you can use the Compute 
-> Data Centers -> Maintenance option with "Ignore OVF update failure" checked.
   This will force deactivation of the SD.

   Will provide further details about the issue in the ticket.


   On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>>>
 wrote:
   Hi,

 does anybody have an advice how to activate or safely remove that storage 
domain?

   Thank you!
   Oliver
   -Ursprüngliche Nachricht-
   Von: Oliver Albl 
mailto:oliver.a...@fabasoft.com>>>
   Gesendet: Dienstag, 5. November 2019 11:20
   An: 
users@ovirt.org>
   Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

   > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
http://fabasoft.com>> wrote:
   >
   > What was the last change in the system? upgrade? network change? storage 
change?
   >

   Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 
(including CentOS hosts to 7.7 1908)

   >
   > This is expected if some domain is not accessible on all hosts.
   >
   >
   > This means sanlock timed out renewing the lockspace
   >
   >
   > If a host cannot access all storage domain in the DC, the system set
   > it to non-operational, and will probably try to reconnect it later.
   >
   >
   > This means reading 4k from start of the metadata lv took 9.6 seconds.
   > Something in
   > the way to storage is bad (kernel, network, storage).
   >
   >
   > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
   > when there are no active paths, before I/O fails, pausing the VM. We
   > also resume paused VMs when storage monitoring works again, so maybe
   > the VM were paused and resumed.
   >
   > However for storage monitoring we have strict 10 seconds timeout. If
   > reading from the metadata lv times out or fail and does not operated
   > normally after
   > 5 minutes, the
   > domain will become inactive.
   >
   >
   > This can explain the read timeouts.
   >
   >
   > This looks the right way to troubleshoot this.
   >
   >
   > We need vdsm logs to understand this failure.
   >
   >
   > This does not mean OVF is corrupted, only that we could not store new
   > data. The older data on the other OVFSTORE disk is probably fine.
   > Hopefuly the system will not try to write to the other OVFSTORE disk
   > overwriting the last good version.
   >
   >
   > This is normal, the first 2048 bytes are always zeroes. This area was
   > used for domain metadata in older versions.
   >
   >
   > Please share more details:
   >
   > - output of "lsblk"
   > - output of "multipath -ll"
   > - output of "/usr/libexec/vdsm/fc-scan -v"
   > - output of "vgs -o +tags problem-

[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver


smime.p7m
Description: S/MIME encrypted message
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4Q4ATO7KOMUF4PKRVPHA3BRMTLWSUWEU/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
in check we had here, we got similar warnings for using the ignore OVF
updates checks, but the SD was set inactive at end of process.
what is the SD status in your case after this try?


On Wed, Dec 4, 2019 at 4:49 PM Albl, Oliver 
wrote:

> Yes.
>
> Am 04.12.2019 um 15:47 schrieb Amit Bawer  aba...@redhat.com>>:
>
>
>
> On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver  > wrote:
> Hi Amit,
>
>   unfortunately no success.
>
> Dec 4, 2019, 3:41:36 PM
> Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system
> because it's not visible by any of the hosts.
>
> Dec 4, 2019, 3:35:09 PM
> Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in
> Data Center Production.
>
> Dec 4, 2019, 3:35:09 PM
> Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data
> isn't updated on those OVF stores (Data Center Production, Storage Domain
> HOST_LUN_219).
>
> Have you selected the checkbox for "Ignore OVF update failure" before
> putting into maintenance?
>
>
> All the best,
> Oliver
>
> Von: Amit Bawer mailto:aba...@redhat.com>>
> Gesendet: Mittwoch, 4. Dezember 2019 15:20
> An: Albl, Oliver mailto:oliver.a...@fabasoft.com
> >>
> Cc: users@ovirt.org; Nir Soffer <
> nsof...@redhat.com>
> Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> Hi Oliver,
>
> For deactivating the unresponsive storage domains, you can use the Compute
> -> Data Centers -> Maintenance option with "Ignore OVF update failure"
> checked.
> This will force deactivation of the SD.
>
> Will provide further details about the issue in the ticket.
>
>
> On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver  > wrote:
> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl mailto:oliver.a...@fabasoft.com
> >>
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  http://fabasoft.com>> wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/log/messages since the issue started
> > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> >
> > A bug is probably the best place to keep these logs and make it easy to
> trac.
>
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821
>
> >
> > Thanks,
> > Nir
>
> Thank you!
> Oliver
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org users-le...@ovirt.org> Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/mes

[ovirt-users] Re: 3node HCI fails when HostedEngineLocal is trying to add additional Gluster members

2019-12-04 Thread thomas
After spending another couple of hours trying to track down the problem, I have 
found that the "lost connection" seems due to KVM shutting down, because it 
cannot find the certificates for the Spice and VNC connections in 
/etc/pki/vdsm/*, where 'ovirt-hosted-engine-cleanup' deleted them.

So now I wonder: Who is supposed to (re-)generated them afterwards?

Assuming that it was a much earlier step I proceeded to completely undo the 
deployment, get rid of the Gluster setup etc. and start from the very 
beginning, only to find that that didn't change a thing: It still missed those 
certificates

...while something or someone *did* generated them when I tried a distinct and 
new set of nodes for counter-testing..

That setup failed with an Ansible error (reported separately), but I have now 
grown afraid of using 'ovirt-hosted-engine-cleanup' when I don't know how to 
get the ciphers/keys for /etc/pki/vsdm/{spice|vnc} regenerated...

Can anyone shed some light into this darkness?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7AFFFU6KMDPSBT7TJFIQKBEQHX5RON5/


[ovirt-users] Re: 3node HCI fails when HostedEngineLocal is trying to add additional Gluster members

2019-12-04 Thread thomas
Thanks Strahil, for your suggestions.

Actually, I was far beyond the pick-up point you describe, as the Gluster had 
all been prepared and was operable, even the local VM was already running and 
accessible via the GUI.

But I picked up your hint to try to continue with the scripted variant, and 
found that it allowed me much better insight into what was going on.

I am a little worried, though, that it actually works a little differently from 
the GUI-wizard variant, in any case the failures don't seem identical.
That would have implications in terms of test-automation, which I'd rather not 
have to worry about.

In a separate test for the same operation on a separate set of hardware the 
installation got significantly further, up to the point where the local VM had 
actually been moved onto the Gluster storage and into the Cluster, but then 
failed a validation step at the very end (while the VM is actually up and 
running, albeit only withe primary host and that listed as "Unresponsive" while 
it's hosting the VM)

I have opened a separate tickt for that...
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FTCFIP7Q5KWXTMEKWKGOEPEUCI5R5G2M/


[ovirt-users] Did a change in Ansible 2.9 in the ovirt_vm_facts module break the hosted-engine-setup?

2019-12-04 Thread thomas
I am having problems installing a 3-node HCI cluster on machines that used to 
work fine and on a fresh set of servers, too.

After a series of setbacks on a set of machines with failed installations and 
potentially failed clean-ups, I am ssing a fresh set of servers that had never 
run oVirt before.

Patched to just before today's bigger changes (new kernel..) installation 
failed during the setup of the local hosted engine first and when I switched 
from GUi to script setup 'hosted-engine --deploy' *without* doing a cleanup 
this time, it progressed further to the point where the local VM had actually 
been teleported onto the (gluster based) cluster and is running there. 

In what seems the absolutely final action before adding the other two hosts, 
ansible is doing a finaly inventory of the virtual machine and collects facts 
or rather information (that's perhaps the breaking point) about that first VM 
before I would continue, only the data structure got renamed between ansible 
2.8 and 2.9 according to this:
https://fossies.org/diffs/ansible/2.8.5_vs_2.9.0rc1/lib/ansible/modules/cloud/ovirt/ovirt_vm_facts.py-diff.html
 

And the resulting error message from the 
/var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-*.log file is:

2019-12-04 13:15:19,232+ ERROR 
otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 
fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_vms": 
[{"affinity_labels": [], "applications": [], "bios": {"boot_menu": {"enabled": 
false}, "type": "i440fx_sea_bios"}, "cdroms": [], "cluster": {"href": 
"/ovirt-engine/api/clusters/6616551e-1695-11ea-a86b-00163e34e004", "id": 
"6616551e-1695-11ea-a86b-00163e34e004"}, "comment": "", "cpu": {"architecture": 
"x86_64", "topology": {"cores": 1, "sockets": 4, "threads": 1}}, "cpu_profile": 
{"href": "/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-0250", 
"id": "58ca604e-01a7-003f-01de-0250"}, "cpu_shares": 0, 
"creation_time": "2019-12-04 13:01:12.78+00:00", "delete_protected": false, 
"description": "", "disk_attachments": [], "display": {"address": "127.0.0.1", 
"allow_override": false, "certificate": {"content": "-BEGIN 
CERTIFICATE-(redacted)-END CERTIFICATE-\n", "organization": 
 "***", "subject": "**"}, "copy_paste_enabled": true, "disconnect_action": 
"LOCK_SCREEN", "file_transfer_enabled": true, "monitors": 1, "port": 5900, 
"single_qxl_pci": false, "smartcard_enabled": false, "type": "vnc"}, "fqdn": 
"xdrd1001s.priv.atos.fr", "graphics_consoles": [], "guest_operating_system": 
{"architecture": "x86_64", "codename": "", "distribution": "CentOS Linux", 
"family": "Linux", "kernel": {"version": {"build": 0, "full_version": 
"3.10.0-1062.4.3.el7.x86_64", "major": 3, "minor": 10, "revision": 1062}}, 
"version": {"full_version": "7", "major": 7}}, "guest_time_zone": {"name": 
"GMT", "utc_offset": "+00:00"}, "high_availability": {"enabled": false, 
"priority": 0}, "host": {"href": 
"/ovirt-engine/api/hosts/75d096fd-4a2f-4ba4-b9fb-941f86daf624", "id": 
"75d096fd-4a2f-4ba4-b9fb-941f86daf624"}, "host_devices": [], "href": 
"/ovirt-engine/api/vms/dee6ec3b-5b4a-4063-ade9-12dece0f5fab", "id": 
"dee6ec3b-5b4a-4063-ade9-12dece0f5fab", "io": {"threads": 1}, "katello_errata": 
[], "la
 rge_icon": {"href": 
"/ovirt-engine/api/icons/9588ebfc-865a-4969-9829-d170d3654900", "id": 
"9588ebfc-865a-4969-9829-d170d3654900"}, "memory": 17179869184, 
"memory_policy": {"guaranteed": 17179869184, "max": 17179869184}, "migration": 
{"auto_converge": "inherit", "compressed": "inherit"}, "migration_downtime": 
-1, "multi_queues_enabled": true, "name": "external-HostedEngineLocal", 
"next_run_configuration_exists": false, "nics": [], "numa_nodes": [], 
"numa_tune_mode": "interleave", "origin": "external", "original_template": 
{"href": "/ovirt-engine/api/templates/----", 
"id": "----"}, "os": {"boot": {"devices": 
["hd"]}, "type": "other"}, "permissions": [], "placement_policy": {"affinity": 
"migratable"}, "quota": {"id": "7af18f3a-1695-11ea-ab7e-00163e34e004"}, 
"reported_devices": [], "run_once": false, "sessions": [], "small_icon": 
{"href": "/ovirt-engine/api/icons/dec3572e-7465-4527-884b-f7c2eb2ed811", "id": 
"dec3572e-7465-4
 527-884b-f7c2eb2ed811"}, "snapshots": [], "sso": {"methods": [{"id": 
"guest_agent"}]}, "start_paused": false, "stateless": false, "statistics": [], 
"status": "unknown", "storage_error_resume_behaviour": "auto_resume", "tags": 
[], "template": {"href": 
"/ovirt-engine/api/templates/----", "id": 
"----"}, "time_zone": {"name": "Etc/GMT"}, 
"type": "desktop", "usb": {"enabled": false}, "watchdogs": []}]}, "attempts": 
24, "changed": false, "deprecations": [{"msg": "The 'ovirt_vm_facts' module has 
been renamed to 'ovirt_vm_info', and the renamed one no longer returns 
ansible_facts", "version": "2.13"}]}

If that is the 

[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Yes.

Am 04.12.2019 um 15:47 schrieb Amit Bawer 
mailto:aba...@redhat.com>>:



On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:
Hi Amit,

  unfortunately no success.

Dec 4, 2019, 3:41:36 PM
Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system because 
it's not visible by any of the hosts.

Dec 4, 2019, 3:35:09 PM
Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in Data 
Center Production.

Dec 4, 2019, 3:35:09 PM
Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data isn't 
updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).

Have you selected the checkbox for "Ignore OVF update failure" before putting 
into maintenance?


All the best,
Oliver

Von: Amit Bawer mailto:aba...@redhat.com>>
Gesendet: Mittwoch, 4. Dezember 2019 15:20
An: Albl, Oliver mailto:oliver.a...@fabasoft.com>>
Cc: users@ovirt.org; Nir Soffer 
mailto:nsof...@redhat.com>>
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain

Hi Oliver,

For deactivating the unresponsive storage domains, you can use the Compute -> 
Data Centers -> Maintenance option with "Ignore OVF update failure" checked.
This will force deactivation of the SD.

Will provide further details about the issue in the ticket.


On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:
Hi,

  does anybody have an advice how to activate or safely remove that storage 
domain?

Thank you!
Oliver
-Ursprüngliche Nachricht-
Von: Oliver Albl mailto:oliver.a...@fabasoft.com>>
Gesendet: Dienstag, 5. November 2019 11:20
An: users@ovirt.org
Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

> On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
> http://fabasoft.com>> wrote:
>
> What was the last change in the system? upgrade? network change? storage 
> change?
>

Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 (including 
CentOS hosts to 7.7 1908)

>
> This is expected if some domain is not accessible on all hosts.
>
>
> This means sanlock timed out renewing the lockspace
>
>
> If a host cannot access all storage domain in the DC, the system set
> it to non-operational, and will probably try to reconnect it later.
>
>
> This means reading 4k from start of the metadata lv took 9.6 seconds.
> Something in
> the way to storage is bad (kernel, network, storage).
>
>
> We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> when there are no active paths, before I/O fails, pausing the VM. We
> also resume paused VMs when storage monitoring works again, so maybe
> the VM were paused and resumed.
>
> However for storage monitoring we have strict 10 seconds timeout. If
> reading from the metadata lv times out or fail and does not operated
> normally after
> 5 minutes, the
> domain will become inactive.
>
>
> This can explain the read timeouts.
>
>
> This looks the right way to troubleshoot this.
>
>
> We need vdsm logs to understand this failure.
>
>
> This does not mean OVF is corrupted, only that we could not store new
> data. The older data on the other OVFSTORE disk is probably fine.
> Hopefuly the system will not try to write to the other OVFSTORE disk
> overwriting the last good version.
>
>
> This is normal, the first 2048 bytes are always zeroes. This area was
> used for domain metadata in older versions.
>
>
> Please share more details:
>
> - output of "lsblk"
> - output of "multipath -ll"
> - output of "/usr/libexec/vdsm/fc-scan -v"
> - output of "vgs -o +tags problem-domain-id"
> - output of "lvs -o +tags problem-domain-id"
> - contents of /etc/multipath.conf
> - contents of /etc/multipath.conf.d/*.conf
> - /var/log/messages since the issue started
> - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
>
> A bug is probably the best place to keep these logs and make it easy to trac.

Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

>
> Thanks,
> Nir

Thank you!
Oliver
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H5MDS2RZXPE65CMQEOF6WN7ZVWGCDETO/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver
Hi Amit,



  unfortunately no success.



Dec 4, 2019, 3:41:36 PM

Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system because 
it's not visible by any of the hosts.



Dec 4, 2019, 3:35:09 PM

Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in Data 
Center Production.



Dec 4, 2019, 3:35:09 PM

Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data isn't 
updated on those OVF stores (Data Center Production, Storage Domain 
HOST_LUN_219).



All the best,

Oliver



Von: Amit Bawer 
Gesendet: Mittwoch, 4. Dezember 2019 15:20
An: Albl, Oliver 
Cc: users@ovirt.org; Nir Soffer 
Betreff: Re: [ovirt-users] Re: Cannot activate/deactivate storage domain



Hi Oliver,



For deactivating the unresponsive storage domains, you can use the Compute -> 
Data Centers -> Maintenance option with "Ignore OVF update failure" checked.

This will force deactivation of the SD.



Will provide further details about the issue in the ticket.





On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
mailto:oliver.a...@fabasoft.com>> wrote:

   Hi,

 does anybody have an advice how to activate or safely remove that storage 
domain?

   Thank you!
   Oliver
   -Ursprüngliche Nachricht-
   Von: Oliver Albl mailto:oliver.a...@fabasoft.com>>
   Gesendet: Dienstag, 5. November 2019 11:20
   An: users@ovirt.org
   Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain

   > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver 
http://fabasoft.com>> wrote:
   >
   > What was the last change in the system? upgrade? network change? storage 
change?
   >

   Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7 
(including CentOS hosts to 7.7 1908)

   >
   > This is expected if some domain is not accessible on all hosts.
   >
   >
   > This means sanlock timed out renewing the lockspace
   >
   >
   > If a host cannot access all storage domain in the DC, the system set
   > it to non-operational, and will probably try to reconnect it later.
   >
   >
   > This means reading 4k from start of the metadata lv took 9.6 seconds.
   > Something in
   > the way to storage is bad (kernel, network, storage).
   >
   >
   > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
   > when there are no active paths, before I/O fails, pausing the VM. We
   > also resume paused VMs when storage monitoring works again, so maybe
   > the VM were paused and resumed.
   >
   > However for storage monitoring we have strict 10 seconds timeout. If
   > reading from the metadata lv times out or fail and does not operated
   > normally after
   > 5 minutes, the
   > domain will become inactive.
   >
   >
   > This can explain the read timeouts.
   >
   >
   > This looks the right way to troubleshoot this.
   >
   >
   > We need vdsm logs to understand this failure.
   >
   >
   > This does not mean OVF is corrupted, only that we could not store new
   > data. The older data on the other OVFSTORE disk is probably fine.
   > Hopefuly the system will not try to write to the other OVFSTORE disk
   > overwriting the last good version.
   >
   >
   > This is normal, the first 2048 bytes are always zeroes. This area was
   > used for domain metadata in older versions.
   >
   >
   > Please share more details:
   >
   > - output of "lsblk"
   > - output of "multipath -ll"
   > - output of "/usr/libexec/vdsm/fc-scan -v"
   > - output of "vgs -o +tags problem-domain-id"
   > - output of "lvs -o +tags problem-domain-id"
   > - contents of /etc/multipath.conf
   > - contents of /etc/multipath.conf.d/*.conf
   > - /var/log/messages since the issue started
   > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
   >
   > A bug is probably the best place to keep these logs and make it easy to 
trac.

   Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821

   >
   > Thanks,
   > Nir

   Thank you!
   Oliver
   ___
   Users mailing list -- users@ovirt.org
   To unsubscribe send an email to 
users-le...@ovirt.org Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/
   oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
   List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVYHPVKPV5575BQ4XUYOFGZV4KZ2IF2H/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
On Wed, Dec 4, 2019 at 4:42 PM Albl, Oliver 
wrote:

> Hi Amit,
>
>
>
>   unfortunately no success.
>
>
>
> Dec 4, 2019, 3:41:36 PM
>
> Storage Domain HOST_LUN_219 (Data Center xxx) was deactivated by system
> because it's not visible by any of the hosts.
>
>
>
> Dec 4, 2019, 3:35:09 PM
>
> Failed to update VMs/Templates OVF data for Storage Domain HOST_LUN_219 in
> Data Center Production.
>
>
>
> Dec 4, 2019, 3:35:09 PM
>
> Failed to update OVF disks 77c64b39-fe50-4d05-b77f-8131ad1f95f9, OVF data
> isn't updated on those OVF stores (Data Center Production, Storage Domain
> HOST_LUN_219).
>

Have you selected the checkbox for "Ignore OVF update failure" before
putting into maintenance?


>
> All the best,
>
> Oliver
>
>
>
> *Von:* Amit Bawer 
> *Gesendet:* Mittwoch, 4. Dezember 2019 15:20
> *An:* Albl, Oliver 
> *Cc:* users@ovirt.org; Nir Soffer 
> *Betreff:* Re: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
>
>
> Hi Oliver,
>
>
>
> For deactivating the unresponsive storage domains, you can use the Compute
> -> Data Centers -> Maintenance option with "Ignore OVF update failure"
> checked.
>
> This will force deactivation of the SD.
>
>
>
> Will provide further details about the issue in the ticket.
>
>
>
>
>
> On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
> wrote:
>
> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl 
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/log/messages since the issue started
> > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> >
> > A bug is probably the best place to keep these logs and make it easy to
> trac.
>
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821
>
> >
> > Thanks,
> > Nir
>
> Thank you!
> Oliver
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AF2GBIQKW45QVGJCEN2O3ZYV2BVTI4YU/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Albl, Oliver


smime.p7m
Description: S/MIME encrypted message
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BWQ6JWCEA2SCQX4YSL3Y5Z5IHONQ7ZH3/


[ovirt-users] Re: Cannot activate/deactivate storage domain

2019-12-04 Thread Amit Bawer
Hi Oliver,

For deactivating the unresponsive storage domains, you can use the Compute
-> Data Centers -> Maintenance option with "Ignore OVF update failure"
checked.
This will force deactivation of the SD.

Will provide further details about the issue in the ticket.


On Tue, Dec 3, 2019 at 12:02 PM Albl, Oliver 
wrote:

> Hi,
>
>   does anybody have an advice how to activate or safely remove that
> storage domain?
>
> Thank you!
> Oliver
> -Ursprüngliche Nachricht-
> Von: Oliver Albl 
> Gesendet: Dienstag, 5. November 2019 11:20
> An: users@ovirt.org
> Betreff: [ovirt-users] Re: Cannot activate/deactivate storage domain
>
> > On Mon, Nov 4, 2019 at 9:18 PM Albl, Oliver  wrote:
> >
> > What was the last change in the system? upgrade? network change? storage
> change?
> >
>
> Last change was four weeks ago ovirt upgrade from 4.3.3 to 4.3.6.7
> (including CentOS hosts to 7.7 1908)
>
> >
> > This is expected if some domain is not accessible on all hosts.
> >
> >
> > This means sanlock timed out renewing the lockspace
> >
> >
> > If a host cannot access all storage domain in the DC, the system set
> > it to non-operational, and will probably try to reconnect it later.
> >
> >
> > This means reading 4k from start of the metadata lv took 9.6 seconds.
> > Something in
> > the way to storage is bad (kernel, network, storage).
> >
> >
> > We 20 seconds (4 retires, 5 seconds per retry) gracetime in multipath
> > when there are no active paths, before I/O fails, pausing the VM. We
> > also resume paused VMs when storage monitoring works again, so maybe
> > the VM were paused and resumed.
> >
> > However for storage monitoring we have strict 10 seconds timeout. If
> > reading from the metadata lv times out or fail and does not operated
> > normally after
> > 5 minutes, the
> > domain will become inactive.
> >
> >
> > This can explain the read timeouts.
> >
> >
> > This looks the right way to troubleshoot this.
> >
> >
> > We need vdsm logs to understand this failure.
> >
> >
> > This does not mean OVF is corrupted, only that we could not store new
> > data. The older data on the other OVFSTORE disk is probably fine.
> > Hopefuly the system will not try to write to the other OVFSTORE disk
> > overwriting the last good version.
> >
> >
> > This is normal, the first 2048 bytes are always zeroes. This area was
> > used for domain metadata in older versions.
> >
> >
> > Please share more details:
> >
> > - output of "lsblk"
> > - output of "multipath -ll"
> > - output of "/usr/libexec/vdsm/fc-scan -v"
> > - output of "vgs -o +tags problem-domain-id"
> > - output of "lvs -o +tags problem-domain-id"
> > - contents of /etc/multipath.conf
> > - contents of /etc/multipath.conf.d/*.conf
> > - /var/log/messages since the issue started
> > - /var/log/vdsm/vdsm.log* since the issue started on one of the hosts
> >
> > A bug is probably the best place to keep these logs and make it easy to
> trac.
>
> Please see https://bugzilla.redhat.com/show_bug.cgi?id=1768821
>
> >
> > Thanks,
> > Nir
>
> Thank you!
> Oliver
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org Privacy Statement:
> https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QZ5ZN2S7N54JYVV3RWOYOHTEAWFQ23Q7/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/E7AMRZVLGZALEKSWOG2SWMSYQNDNHTOU/


[ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-12-04 Thread Jiffin Thottan
Hi Krutika,

Apparently, in context acl info got corrupted see brick logs

[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:dae9ffad-6acd-4a43-9372-229a3018fde9-GRAPH_ID:0-PID:11468-HOST:ovirt2.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:107,gid:107,perm:1,ngrps:4),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]

which resulted in the situation. There was one bug similar was reported 
https://bugzilla.redhat.com/show_bug.cgi?id=1668286 and

it got fixed in 6.6 release IMO 
https://review.gluster.org/#/c/glusterfs/+/23233/. But here he mentioned he saw 
the issue when

he upgraded from 6.5 to 6.6

One way to workaround is to perform a dummy setfacl(preferably using root) on 
the corrupted files which will forcefully fetch the acl

info again from backend and update the context. Another approach to restart 
brick process(kill and vol start force)

Regards,
Jiffin

- Original Message -
From: "Krutika Dhananjay" 
To: "Strahil Nikolov" , "Jiffin Thottan" 
, "raghavendra talur" 
Cc: "Nir Soffer" , "Rafi Kavungal Chundattu Parambil" 
, "users" , "gluster-user" 

Sent: Monday, December 2, 2019 11:48:22 AM
Subject: Re: [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now 
available for testing

Sorry about the late response.

I looked at the logs. These errors are originating from posix-acl
translator -



*[2019-11-17 07:55:47.090065] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090174] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090299] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]*

Jiffin/Raghavendra Talur,
Can you help?

-Krutika

On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov 
wrote:

> Hi Nir,All,
>
> it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> screenshot of oVirt running on v7 gluster).
> It seems strange that both my serious issues with oVirt are related to
> gluster issue (1st gluster v3  to v5 migration and now this one).
>
> I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> nodes.
> Now both Engine and all my VMs are back online - so if you hit issues with
> 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> deciding to wipe everything.
>
> @Krutika,
>
> I guess you will ask for the logs, so let's switch to gluster-users about
> this one ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Hi Krutika,
>
> I have enabled TRACE log level for the volume data_fast,
>
> but the issue is not much clear:
> FUSE reports:
>
> [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=6d9ed2e5-d4f2-4749-839b-2f1
> 3a68ed472 from backend
> [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565653] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43

[ovirt-users] Re: AWX and error using ovirt as an inventory source

2019-12-04 Thread Guillaume Pavese
Could it be a rights problem, ie your awx user can not access
to /opt/my-envs?
You could try to create the ovirt virtualenv in the default path :
/var/lib/awx/venv/

Guillaume Pavese
Ingénieur Système et Réseau
Interactiv-Group


On Wed, Dec 4, 2019 at 5:32 PM Gianluca Cecchi 
wrote:

> On Tue, Dec 3, 2019 at 3:02 PM Nathanaël Blanchet 
> wrote:
>
>> Great, I hope Gianluca will be able to sync with the container version!
>>
>>
> Unfortunately not yet.
> When running with the new virtual env I get:
>
> 2.352 INFO Updating inventory 4: MYDC_OVIRT
> 2.825 INFO Reading Ansible inventory source: /var/lib/awx/venv/awx
> /lib64/python3.6/site-packages/awx/plugins/inventory/ovirt4.py
> Traceback (most recent call last):
>   File "/var/lib/awx/venv/awx/bin/awx-manage", line 11, in 
> load_entry_point('awx==9.0.1.0', 'console_scripts', 'awx-manage')()
> ...
>   File 
> "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py",
> line 101, in build_env
> for version in os.listdir(venv_libdir):
> FileNotFoundError: [Errno 2] No such file or directory:
> '/opt/my-envs/ovirt/lib'
>
> But I have this:
>
> bash-4.4# ls -l /opt/my-envs/ovirt/
> total 4
> drwxr-xr-x 2 root root 4096 Nov 30 11:20 bin
> drwxr-xr-x 2 root root   23 Nov 30 10:56 include
> drwxr-xr-x 3 root root   23 Nov 30 10:56 lib
> lrwxrwxrwx 1 root root3 Nov 30 10:56 lib64 -> lib
> drwxr-xr-x 3 root root   17 Nov 30 11:07 share
> bash-4.4# ls -l /opt/my-envs/ovirt/lib
> total 4
> drwxr-xr-x 4 root root 4096 Nov 30 10:56 python2.7
> bash-4.4#
>
> What I'm missing in your opinion?
> I don't understand the FileNotFoundError
> Thanks in advance,
> Gianluca
>

-- 


Ce message et toutes les pièces jointes (ci-après le “message”) sont 
établis à l’intention exclusive de ses destinataires et sont confidentiels. 
Si vous recevez ce message par erreur, merci de le détruire et d’en avertir 
immédiatement l’expéditeur. Toute utilisation de ce message non conforme a 
sa destination, toute diffusion ou toute publication, totale ou partielle, 
est interdite, sauf autorisation expresse. L’internet ne permettant pas 
d’assurer l’intégrité de ce message . Interactiv-group (et ses filiales) 
décline(nt) toute responsabilité au titre de ce message, dans l’hypothèse 
ou il aurait été modifié. IT, ES, UK.  

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NFGVTUF3I2BT6TCDUI4XENQZADNOCYQN/


[ovirt-users] Re: AWX and error using ovirt as an inventory source

2019-12-04 Thread Gianluca Cecchi
On Tue, Dec 3, 2019 at 3:02 PM Nathanaël Blanchet  wrote:

> Great, I hope Gianluca will be able to sync with the container version!
>
>
Unfortunately not yet.
When running with the new virtual env I get:

2.352 INFO Updating inventory 4: MYDC_OVIRT
2.825 INFO Reading Ansible inventory source: /var/lib/awx/venv/awx
/lib64/python3.6/site-packages/awx/plugins/inventory/ovirt4.py
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/bin/awx-manage", line 11, in 
load_entry_point('awx==9.0.1.0', 'console_scripts', 'awx-manage')()
...
  File 
"/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/management/commands/inventory_import.py",
line 101, in build_env
for version in os.listdir(venv_libdir):
FileNotFoundError: [Errno 2] No such file or directory:
'/opt/my-envs/ovirt/lib'

But I have this:

bash-4.4# ls -l /opt/my-envs/ovirt/
total 4
drwxr-xr-x 2 root root 4096 Nov 30 11:20 bin
drwxr-xr-x 2 root root   23 Nov 30 10:56 include
drwxr-xr-x 3 root root   23 Nov 30 10:56 lib
lrwxrwxrwx 1 root root3 Nov 30 10:56 lib64 -> lib
drwxr-xr-x 3 root root   17 Nov 30 11:07 share
bash-4.4# ls -l /opt/my-envs/ovirt/lib
total 4
drwxr-xr-x 4 root root 4096 Nov 30 10:56 python2.7
bash-4.4#

What I'm missing in your opinion?
I don't understand the FileNotFoundError
Thanks in advance,
Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HXSOCTA2UGTJOMTTJQKQ47TNLNIGMFDU/


[ovirt-users] Re: Certificate of host is invalid

2019-12-04 Thread Jon bae
Am Do., 28. Nov. 2019 um 09:00 Uhr schrieb Jon bae :

>
>
> Am Do., 28. Nov. 2019 um 07:42 Uhr schrieb Milan Zamazal <
> mzama...@redhat.com>:
>
>> Strahil  writes:
>>
>> > Hi ,
>> >
>> > You can try with:
>> > 1. Set the host in maintenance
>> > 2. From Install dropdown , select 'reinstall' and then configure the
>> > necessary info + whether you would like to use the host as Host for
>> > the HostedEngine VM.
>>
>> Rather than full reinstall, Enroll Certificate action (just next to
>> Reinstall in the menu) should be faster and sufficient.  You still need
>> to set the host to maintenance before being allowed to do it.
>>
>>
> Thank you very much! I though already I have to do my hands dirty in the
> console, but this was very easy!
>

Hi again,
sadly the enrolling of a new certificate has not work. I will try later to
reinstall the host and see if it works.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ARL4TRAPT53S6A7WG4COCE5Z7HRCUXMB/


[ovirt-users] Zabbix monitoring within the node

2019-12-04 Thread Vinícius Ferrão
Hello,

There are some documentation on the web about using Zabbix with oVirt to 
monitor the hypervisor itself like here: https://github.com/hudecof/libzbxovirt 
and here: https://github.com/jensdepuydt/zabbix-ovirt

But what are you ppl doing about this issue? Any recommendations on Zabbix 
monitoring with oVirt Node?

Should I use the Zabbix Agent as a package or use SNMP instead?

Thanks,


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YZULPRJOUGRXAQXNDOT7NADAASODTZY7/