[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Shani Leviim
Hi Artem,
According to oVirt documentation [1], hosts on the same cluster should be
reachable from one to each other.

Can you please share your vdsm log?
I suppose you do manage to ssh that inactive host (correct me if I'm wrong).
While getting the vdsm log, maybe try to restart the network and vdsmd
services on the host.

Another thing you can try on the UI is putting the host on maintenance and
then activate it.

[1]
https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters


*Regards,*

*Shani Leviim*


On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy 
wrote:

> Hello,
>
> May I ask you for and advise?
> I'm running a small oVirt cluster and couple of months ago I decided to do
> an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I
> can only guess what I did wrong - probably one of the problems that I
> haven't switched the cluster from iptables to firewalld. But this is just
> my guess.
>
> The problem is that I have upgraded the engine and one host, and then I
> done an upgrade of second host I can't bring it to active state. Looks like
> VDSM can't detect the network and fails to start. I even tried to reinstall
> the hosts from UI (I have seen that the packages being installed) but
> again, VDSM doesn't startup at the end and reinstallation fails.
>
> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
> forever.
>
> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>
> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>
> The all hosts in cluster are reachable from each other ...  That could be
> the issue?
>
> Thank you in advance!
> --
> Regards,
> Artem
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/K5IHSDIGFOYGU5KUHA7ITP362YOME7OJ/


[ovirt-users] Re: Failed to activate Storage Domain --- ovirt 4.2

2019-06-11 Thread Aminur Rahman
Hi Nir

Yes, the metadata was corrupted but the VMs were running OK. This master 
storage domain has increased its allocation significantly overnight and ran out 
the space limit and went to offline completely. The cluster was online and VMs 
were running OK but the affected Storage Domain went offline. I tired increase 
the storage domain but the Ovirt wasn’t allowing to expend the storage.

Due to time constrain, I had restore the storage domain using Compellent 
snapshot. However, we need to prevent this happening again when Master storage 
Domain fill-up with the space. Currently, we have the following parameter set 
in the 5TB storage Domain.

ID: 0e1f2a5d-a548-476c-94bd-3ab3fe239926
Size: 5119 GiB
Available: 2361 GiB
Used: 2758 GiB
Allocated: 3104 GiB
Over Allocation Ratio: 14%
Images: 13
Warning Low Space Indicator: 10% (511 GiB)
Critical Space Action Blocker: 5 GiB

Please kindly advise what action needs to implement, so we can prevent this 
occurs again in the future.

Thanks
Aminur Rahman
aminur.rah...@iongroup.com
t
+44 20 7398 0243
m
+44 7825 780697
iongroup.com

From: Nir Soffer 
Sent: 10 June 2019 22:07
To: David Teigland 
Cc: Aminur Rahman ; users 
Subject: Re: [ovirt-users] Failed to activate Storage Domain --- ovirt 4.2

On Mon, Jun 10, 2019 at 11:22 PM David Teigland 
mailto:teigl...@redhat.com>> wrote:
On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:
> > [root@uk1-ion-ovm-18  pvscan
> >   /dev/mapper/36000d31005697814: Checksum error at offset
> > 4397954425856
> >   Couldn't read volume group metadata from
> > /dev/mapper/36000d31005697814.
> >   Metadata location on /dev/mapper/36000d31005697814 at
> > 4397954425856 has invalid summary for VG.
> >   Failed to read metadata summary from
> > /dev/mapper/36000d31005697814
> >   Failed to scan VG from /dev/mapper/36000d31005697814
>
> This looks like corrupted vg metadata.

Yes, the second metadata area, at the end of the device is corrupted; the
first metadata area is probably ok.  That version of lvm is not able to
continue by just using the one good copy.

Can we copy the first metadata area into the second metadata area?

Last week I pushed out major changes to LVM upstream to be able to handle
and repair most of these cases.  So, one option is to build lvm from the
upstream master branch, and check if that can read and repair this
metadata.

This sound pretty risky for production.

> David, we keep 2 metadata copies on the first PV. Can we use one of the
> copies on the PV to restore the metadata to the least good state?

pvcreate with --restorefile and --uuid, and with the right backup metadata

What would be the right backup metadata?

could probably correct things, but experiment with some temporary PVs
first.

Aminur, can you copy and compress the metadata areas, and shared them somewhere?

To copy the first metadata area, use:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1 
skip=4096 iflag=skip_bytes

To copy the second metadata area, you need to know the size of the PV. On my 
setup with 100G
PV, I have 800 extents (128M each), and this works:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1 
skip=799

gzip md1 md2

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCD5K4UTMZ3QVS7OC2KWBSXWCHWTXQLV/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Artem Tambovskiy
Hi Shani,

yes, you are right - I can do ssh form aby to any hosts in the cluster.
vdsm.log attached.
I have tried to restart vdsm manually and even done a host restart several
times with no success.
Host activation fails all the time ...

Thank you in advance for your help!
Regard,
Artem

On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:

> Hi Artem,
> According to oVirt documentation [1], hosts on the same cluster should be
> reachable from one to each other.
>
> Can you please share your vdsm log?
> I suppose you do manage to ssh that inactive host (correct me if I'm
> wrong).
> While getting the vdsm log, maybe try to restart the network and vdsmd
> services on the host.
>
> Another thing you can try on the UI is putting the host on maintenance and
> then activate it.
>
> [1]
> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>
>
> *Regards,*
>
> *Shani Leviim*
>
>
> On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hello,
>>
>> May I ask you for and advise?
>> I'm running a small oVirt cluster and couple of months ago I decided to
>> do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time.
>> I can only guess what I did wrong - probably one of the problems that I
>> haven't switched the cluster from iptables to firewalld. But this is just
>> my guess.
>>
>> The problem is that I have upgraded the engine and one host, and then I
>> done an upgrade of second host I can't bring it to active state. Looks like
>> VDSM can't detect the network and fails to start. I even tried to reinstall
>> the hosts from UI (I have seen that the packages being installed) but
>> again, VDSM doesn't startup at the end and reinstallation fails.
>>
>> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
>> forever.
>>
>> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>
>> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
>> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
>> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
>> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
>> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>
>> The all hosts in cluster are reachable from each other ...  That could be
>> the issue?
>>
>> Thank you in advance!
>> --
>> Regards,
>> Artem
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>>
>

-- 
Regards,
Artem


vdsm.tar.bzip2
Description: Binary data
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U65YDKV4P6IFXENCQOCGNR23KXTM6HHD/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Shani Leviim
+Dan Kenigsberg 

Hi Artem,
Thanks for the log.

It seems that this error message appears quite a lot:
2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to
supervdsm service failed: [Errno 2] No such file or directory (panic:29)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
86, in _connect
self._manager.connect, Exception, timeout=60, tries=3)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 58,
in retry
return func()
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in
connect
conn = Client(self._address, authkey=self._authkey)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in
Client
c = SocketClient(address)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in
SocketClient
s.connect(address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory

Can you please verify that the 'supervdsmd.service' is running?


*Regards,*

*Shani Leviim*


On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy 
wrote:

> Hi Shani,
>
> yes, you are right - I can do ssh form aby to any hosts in the cluster.
> vdsm.log attached.
> I have tried to restart vdsm manually and even done a host restart several
> times with no success.
> Host activation fails all the time ...
>
> Thank you in advance for your help!
> Regard,
> Artem
>
> On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:
>
>> Hi Artem,
>> According to oVirt documentation [1], hosts on the same cluster should be
>> reachable from one to each other.
>>
>> Can you please share your vdsm log?
>> I suppose you do manage to ssh that inactive host (correct me if I'm
>> wrong).
>> While getting the vdsm log, maybe try to restart the network and vdsmd
>> services on the host.
>>
>> Another thing you can try on the UI is putting the host on maintenance
>> and then activate it.
>>
>> [1]
>> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>>
>>
>> *Regards,*
>>
>> *Shani Leviim*
>>
>>
>> On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy <
>> artem.tambovs...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> May I ask you for and advise?
>>> I'm running a small oVirt cluster and couple of months ago I decided to
>>> do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time.
>>> I can only guess what I did wrong - probably one of the problems that I
>>> haven't switched the cluster from iptables to firewalld. But this is just
>>> my guess.
>>>
>>> The problem is that I have upgraded the engine and one host, and then I
>>> done an upgrade of second host I can't bring it to active state. Looks like
>>> VDSM can't detect the network and fails to start. I even tried to reinstall
>>> the hosts from UI (I have seen that the packages being installed) but
>>> again, VDSM doesn't startup at the end and reinstallation fails.
>>>
>>> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
>>> forever.
>>>
>>> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>>
>>> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
>>> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
>>> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
>>> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
>>> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>>> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>>
>>> The all hosts in cluster are reachable from each other ...  That could
>>> be the issue?
>>>
>>> Thank you in advance!
>>> --
>>> Regards,
>>> Artem
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>>>
>>
>
> --
> Regards,
> Artem
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C6TR46UNW2GWXOA32NT7FOIA4KJDVTSK/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Artem Tambovskiy
Shani,

supervdsm failing too.

[root@ovirt1 vdsm]# systemctl status supervdsmd
● supervdsmd.service - Auxiliary vdsm service for running helper functions
as root
   Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static;
vendor preset: enabled)
   Active: failed (Result: start-limit) since Tue 2019-06-11 16:18:16 MSK;
5s ago
  Process: 176025 ExecStart=/usr/share/vdsm/daemonAdapter
/usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
(code=exited, status=1/FAILURE)
 Main PID: 176025 (code=exited, status=1/FAILURE)

Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered
failed state.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service holdoff time
over, scheduling restart.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Stopped Auxiliary vdsm service
for running helper functions as root.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: start request repeated too
quickly for supervdsmd.service
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Failed to start Auxiliary vdsm
service for running helper functions as root.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered
failed state.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed.


supervdsm.log is full of messages like
logfile::DEBUG::2019-06-11 16:18:46,379::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:04,401::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:06,289::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:17,535::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:21,528::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:24,541::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:42,543::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:57,442::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:18,539::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:32,041::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:41,051::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})

Regards,
Artem


On Tue, Jun 11, 2019 at 3:59 PM Shani Leviim  wrote:

> +Dan Kenigsberg 
>
> Hi Artem,
> Thanks for the log.
>
> It seems that this error message appears quite a lot:
> 2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to
> supervdsm service failed: [Errno 2] No such file or directory (panic:29)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
> 86, in _connect
> self._manager.connect, Exception, timeout=60, tries=3)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line
> 58, in retry
> return func()
>   File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in
> connect
> conn = Client(self._address, authkey=self._authkey)
>   File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in
> Client
> c = SocketClient(address)
>   File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in
> SocketClient
> s.connect(address)
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 2] No such file or directory
>
> Can you please verify that the 'supervdsmd.service' is running?
>
>
> *Regards,*
>
> *Shani Leviim*
>
>
> On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hi Shani,
>>
>> yes, you are right - I can do ssh form aby to any hosts in the cluster.
>> vdsm.log attached.
>> I have tried to restart vdsm manually and even done a host restart
>> several times with no success.
>> Host activation fails all the time ...
>>
>> Thank you in advance for your help!
>> Regard,
>> Artem
>>
>> On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:
>>
>>> Hi Artem,
>>> According to oVirt documentation [1], hosts on the same cluster should
>>> be reachable from one to each other.
>>>
>>> Can you please share your vdsm log?
>>> I suppose you do manage to ssh that inactive host (correct me if I'm
>>> wrong).
>>> While getting the vdsm log, maybe try to restart the network and vdsmd
>>> services on the host.
>>>
>>> Another thing you can try on the UI is putting the host on maintenance
>>> and then activate it.
>>>
>>> [1]
>>> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>>>
>>>
>>> *Regards,*
>>>
>>> *Shani Leviim*
>>>
>>>
>>> On Mon, Jun 10, 2019 at 4:42 PM Art

[ovirt-users] VM Disk Performance metrics?

2019-06-11 Thread Wesley Stewart
Is there any way to get ovirt disk performance metrics into the web
interface?  It would be nice to see some type of IOPs data, so we can see
which VMs are hitting our data stores the most.

It seems you can run virt-top on a host to get some of these metrics, but
it would be nice to get some sort of data in the gui.

Thanks!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Andreas Elvers
> probably one of the problems that I
> haven't switched the cluster from iptables to firewalld. But this is just
> my guess.
> 

When switching to from 4.2.8 to 4.3.3 I did not change one host from iptables 
to firewalld as well. I was still able to change it later even if the 
documentation somewhere said iptables support is to be removed in 4.3.  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QMJGVX4LFSMGXKRN3ZS7UOAYMFBWC6XS/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Adrian Quintero
Strahil,

Looking at your suggestions I think I need to provide a bit more info on my
current setup.



   1.

   I have 9 hosts in total
   2.

   I have 5 storage domains:
   -

  hosted_storage (Data Master)
  -

  vmstore1 (Data)
  -

  data1 (Data)
  -

  data2 (Data)
  -

  ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not
  let me upload disk images to a data domain without an ISO (I
think this is
  due to a bug)

  3.

   Each volume is of the type “Distributed Replicate” and each one is
   composed of 9 bricks.
   I started with 3 bricks per volume due to the initial Hyperconverged
   setup, then I expanded the cluster and the gluster cluster by 3 hosts at a
   time until I got to a total of 9 hosts.


   -








*Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb
  vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde
   400GB SSD Used for caching purposes From the above layout a few
  questions came up:*
  1.



*Using the web UI, How can I create a 100GB brick and a 2600GB brick to
 replace the bad bricks for “engine” and “vmstore1” within the
same block
 device (sdb) ? What about / dev/sde (caching disk), When I
tried creating a
 new brick thru the UI I saw that I could use / dev/sde for
caching but only
 for 1 brick (i.e. vmstore1) so if I try to create another
brick how would I
 specify it is the same / dev/sde device to be used for caching?*



   1.

   If I want to remove a brick and it being a replica 3, I go to storage >
   Volumes > select the volume > bricks once in there I can select the 3
   servers that compose the replicated bricks and click remove, this gives a
   pop-up window with the following info:

   Are you sure you want to remove the following Brick(s)?
   - vmm11:/gluster_bricks/vmstore1/vmstore1
   - vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1
   - 192.168.0.100:/gluster-bricks/vmstore1/vmstore1
   - Migrate Data from the bricks?

   If I proceed with this that means I will have to do this for all the 4
   volumes, that is just not very efficient, but if that is the only way, then
   I am hesitant to put this into a real production environment as there is no
   way I can take that kind of a hit for +500 vms :) and also I wont have
   that much storage or extra volumes to play with in a real sceneario.

   2.

   After modifying yesterday */ etc/vdsm/vdsm.id  by
   following
   (https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids
   ) I
   was able to add the server **back **to the cluster using a new fqdn and
   a new IP, and tested replacing one of the bricks and this is my mistake as
   mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru
   the UI I could not separate the block device and be used for 2 bricks (one
   for the engine and one for vmstore1). **So in the “gluster vol info” you
   might see vmm102.mydomain.com  *
*but in reality it is myhost1.mydomain.com  *
   3.

   *I am also attaching gluster_peer_status.txt * *and in the last 2
   entries of that file you will see and entry vmm10.mydomain.com
    (old/bad entry) and vmm102.mydomain.com
    (new entry, same server vmm10, but renamed to
   vmm102). *
*Also please find gluster_vol_info.txt file. *
   4.

   *I am ready *
*to redeploy this environment if needed, but I am also ready to test any
   other suggestion. If I can get a good understanding on how to recover from
   this I will be ready to move to production. *
   5.



*Wondering if you’d be willing to have a look at my setup through a shared
   screen? *

*Thanks *


*Adrian*

On Mon, Jun 10, 2019 at 11:41 PM Strahil  wrote:

> Hi Adrian,
>
> You have several options:
> A) If you have space on another gluster volume (or volumes) or on
> NFS-based storage, you can migrate all VMs live . Once you do it,  the
> simple way will be to stop and remove the storage domain (from UI) and
> gluster volume that correspond to the problematic brick. Once gone, you
> can  remove the entry in oVirt for the old host and add the newly built
> one.Then you can recreate your volume and migrate the data back.
>
> B)  If you don't have space you have to use a more riskier approach
> (usually it shouldn't be risky, but I had bad experience in gluster v3):
> - New server has same IP and hostname:
> Use command line and run the 'gluster volume reset-brick VOLNAME
> HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit'
> Replace VOLNAME with your volume name.
> A more practical example would be:
> 'gluster volume reset-brick data ovirt3:/gluster_bricks/data/brick
> ovirt3:/gluster_ ricks/data/brick commit'
>
> If it refuses, then you have to cleanup '/gluster_bricks/data' 

[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Adrian Quintero
adding gluster pool list:
UUID Hostname State
2c86fa95-67a2-492d-abf0-54da625417f8  vmm12.mydomain.com Connected
ab099e72-0f56-4d33-a16b-ba67d67bdf9d  vmm13.mydomain.com Connected
c35ad74d-1f83-4032-a459-079a27175ee4 vmm14.mydomain.com Connected
aeb7712a-e74e-4492-b6af-9c266d69bfd3  vmm17.mydomain.com Connected
4476d434-d6ff-480f-b3f1-d976f642df9c vmm16.mydomain.com Connected
22ec0c0a-a5fc-431c-9f32-8b17fcd80298   vmm15.mydomain.com Connected
caf84e9f-3e03-4e6f-b0f8-4c5ecec4bef6vmm18.mydomain.com Connected
18385970-aba6-4fd1-85a6-1b13f663e60b  vmm10.mydomain.com * Disconnected
//server that went bad.*
b152fd82-8213-451f-93c6-353e96aa3be9  vmm102.mydomain.com Connected
//vmm10 but with different name
228a9282-c04e-4229-96a6-67cb47629892 localhost
Connected

On Tue, Jun 11, 2019 at 11:24 AM Adrian Quintero 
wrote:

> Strahil,
>
> Looking at your suggestions I think I need to provide a bit more info on
> my current setup.
>
>
>
>1.
>
>I have 9 hosts in total
>2.
>
>I have 5 storage domains:
>-
>
>   hosted_storage (Data Master)
>   -
>
>   vmstore1 (Data)
>   -
>
>   data1 (Data)
>   -
>
>   data2 (Data)
>   -
>
>   ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not
>   let me upload disk images to a data domain without an ISO (I think this 
> is
>   due to a bug)
>
>   3.
>
>Each volume is of the type “Distributed Replicate” and each one is
>composed of 9 bricks.
>I started with 3 bricks per volume due to the initial Hyperconverged
>setup, then I expanded the cluster and the gluster cluster by 3 hosts at a
>time until I got to a total of 9 hosts.
>
>
>-
>
>
>
>
>
>
>
>
> *Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb
>   vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde
>    400GB SSD Used for caching purposes From the above layout a few
>   questions came up:*
>   1.
>
>
>
> *Using the web UI, How can I create a 100GB brick and a 2600GB brick to
>  replace the bad bricks for “engine” and “vmstore1” within the same 
> block
>  device (sdb) ? What about / dev/sde (caching disk), When I tried 
> creating a
>  new brick thru the UI I saw that I could use / dev/sde for caching 
> but only
>  for 1 brick (i.e. vmstore1) so if I try to create another brick how 
> would I
>  specify it is the same / dev/sde device to be used for caching?*
>
>
>
>1.
>
>If I want to remove a brick and it being a replica 3, I go to storage
>> Volumes > select the volume > bricks once in there I can select the 3
>servers that compose the replicated bricks and click remove, this gives a
>pop-up window with the following info:
>
>Are you sure you want to remove the following Brick(s)?
>- vmm11:/gluster_bricks/vmstore1/vmstore1
>- vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1
>- 192.168.0.100:/gluster-bricks/vmstore1/vmstore1
>- Migrate Data from the bricks?
>
>If I proceed with this that means I will have to do this for all the 4
>volumes, that is just not very efficient, but if that is the only way, then
>I am hesitant to put this into a real production environment as there is no
>way I can take that kind of a hit for +500 vms :) and also I wont have
>that much storage or extra volumes to play with in a real sceneario.
>
>2.
>
>After modifying yesterday */ etc/vdsm/vdsm.id  by
>following
>(https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids
>) I
>was able to add the server **back **to the cluster using a new fqdn
>and a new IP, and tested replacing one of the bricks and this is my mistake
>as mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru
>the UI I could not separate the block device and be used for 2 bricks (one
>for the engine and one for vmstore1). **So in the “gluster vol info”
>you might see vmm102.mydomain.com  *
> *but in reality it is myhost1.mydomain.com  *
>3.
>
>*I am also attaching gluster_peer_status.txt * *and in the last 2
>entries of that file you will see and entry vmm10.mydomain.com
> (old/bad entry) and vmm102.mydomain.com
> (new entry, same server vmm10, but renamed to
>vmm102). *
> *Also please find gluster_vol_info.txt file. *
>4.
>
>*I am ready *
> *to redeploy this environment if needed, but I am also ready to test any
>other suggestion. If I can get a good understanding on how to recover from
>this I will be ready to move to production. *
>5.
>
>
>
> *Wondering if you’d be willing to have a look at my setup through a shared
>screen? *
>
> *Thanks *
>
>
> *Adrian*
>
> On Mon, Jun 10, 2019

[ovirt-users] Re: VM Disk Performance metrics?

2019-06-11 Thread Jayme
Have you looked at installing ovirt metrics store?

On Tue, Jun 11, 2019 at 12:56 PM Wesley Stewart  wrote:

> Is there any way to get ovirt disk performance metrics into the web
> interface?  It would be nice to see some type of IOPs data, so we can see
> which VMs are hitting our data stores the most.
>
> It seems you can run virt-top on a host to get some of these metrics, but
> it would be nice to get some sort of data in the gui.
>
> Thanks!
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JYODX7ILXRM7DZDVCKC5UGWXULVEHJB5/


[ovirt-users] RFE: HostedEngine to use boom by default

2019-06-11 Thread Strahil Nikolov
Hello All,
I have seen a lot of cases where the HostedEngine gets corrupted/broken and 
beyond repair.
I think that BOOM is a good option for our HostedEngine appliances due to the 
fact that it supports booting from LVM snapshots and thus being able to easily 
recover after upgrades or other outstanding situations.
Sadly, BOOM has 1 drawback - that everything should be under a single snapshot 
- thus no separation of /var /log or /audit.
Do you think that changing the appliance layout is worth it ?
Note: I might have an unsupported layout that could cause my confusion.Is your 
layout a single root LV ?
Best Regards,Strahil Nikolov___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5OTOIAI4BXMVRFN5MCDGXNZHYB46XWLF/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Strahil Nikolov
 Do you have empty space to store the VMs ? If yes, you can always script the 
migration of the disks via the API . Even a bash script and curl can do the 
trick.
About the /dev/sdb , I still don't get it . A pure "df -hT" from a node will 
make it way clear. I guess '/dev/sdb' is a PV and you got 2 LVs ontop of it.
Note: I should admit that as an admin - I don't use UI for gluster management.
For now do not try to remove the brick. The approach is either to migrate the 
qemu disks to another storage or to reset-brick/replace-brick in order to 
restore the replica count.I will check the file and I will try to figure it out.
Redeployment never fixes the issue, it just speeds up the recovery. If you can 
afford the time to spent on fixing the issue - then do not redeploy.
I would be able to take a look next week , but keep in mind that I'm not so in 
deep with oVirt - I have started playing with it when I deployed my lab.
Best Regards,Strahil Nikolov 
 Strahil,
  
Looking at yoursuggestions I think I need to provide a bit more info on my 
currentsetup. 



   
   -
I have 9 hosts in total
 
   -
I have 5 storage domains:
   
  -   
hosted_storage (Data Master)
 
  -   
vmstore1 (Data)
 
  -   
data1 (Data)
 
  -   
data2 (Data)
 
  -   
ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not let me 
upload disk images to a data domain without an ISO (I think this is due to a 
bug)  
  
 
 
 
   -
Each volume is of the type “Distributed Replicate” and each one is composed of 
9 bricks.   
I started with 3 bricks per volume due to the initial Hyperconverged setup, 
then I expanded the cluster and the gluster cluster by 3 hosts at a time until 
I got to a total of 9 hosts.

   
   
   -
Disks, bricks and sizes used per volume   
 / dev/sdb engine 100GB   
 / dev/sdb vmstore1 2600GB   
 / dev/sdc data1 2600GB   
 / dev/sdd data2 2600GB   
/ dev/sde  400GB SSD Used for caching purposes   
   
>From the above layout a few questions came up:
   
  -   
Using the web UI, How can I create a 100GB brick and a 2600GB brick to replace 
the bad bricks for “engine” and “vmstore1” within the same block device (sdb) ? 
  
  
What about / dev/sde (caching disk), When I tried creating a new brick thru the 
UI I saw that I could use / dev/sde for caching but only for 1 brick (i.e. 
vmstore1) so if I try to create another brick how would I specify it is the 
same / dev/sde device to be used for caching?
 
 



   
   -
If I want to remove a brick and it being a replica 3, I go to storage > Volumes 
> select the volume > bricks once in there I can select the 3 servers that 
compose the replicated bricks and click remove, this gives a pop-up window with 
the following info:   
   
Are you sure you want to remove the following Brick(s)?   
- vmm11:/gluster_bricks/vmstore1/vmstore1   
- vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1   
- 192.168.0.100:/gluster-bricks/vmstore1/vmstore1   
- Migrate Data from the bricks?   
   
If I proceed with this that means I will have to do this for all the 4 volumes, 
that is just not very efficient, but if that is the only way, then I am 
hesitant to put this into a real production environment as there is no way I 
can take that kind of a hit for +500 vms :) and also I wont have that much 
storage or extra volumes to play with in a real sceneario.   
   
 
 
   -
After modifying yesterday / etc/vdsm/vdsm.id by following 
(https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids) I was 
able to add the server back to the cluster using a new fqdn and a new IP, and 
tested replacing one of the bricks and this is my mistake as mentioned in #3 
above I used / dev/sdb entirely for 1 brick because thru the UI I could not 
separate the block device and be used for 2 bricks (one for the engine and one 
for vmstore1). So in the “gluster vol info” you might see vmm102.mydomain.com 
but in reality it is myhost1.mydomain.com   
   
 
 
   -
I am also attaching gluster_peer_status.txt  and in the last 2 entries of that 
file you will see and entry vmm10.mydomain.com (old/bad entry) and 
vmm102.mydomain.com (new entry, same server vmm10, but renamed to vmm102). Also 
please find gluster_vol_info.txt file.   
   
 
 
   -
I am ready to redeploy this environment if needed, but I am also ready to test 
any other suggestion. If I can get a good understanding on how to recover from 
this I will be ready to move to production.   
   
 
 
   -
Wondering if you’d be willing to have a look at my setup through a shared 
screen?   
   
   
 


Thanks




Adrian

On Mon, Jun 10, 2019 at 11:41 PM Strahil  wrote:


Hi Adrian,

You have several options:
A) If you have space on another gluster volume (or volumes) or on NFS-based 
storage, you can migrate all VMs live . Once you do it,  the simple way will be 
to stop and remove the storage domain (from UI) and gluster volume that 
c

[ovirt-users] Re: VM Disk Performance metrics?

2019-06-11 Thread Strahil Nikolov
 +1 vote from me.

Best Regards,Strahil Nikolov
В вторник, 11 юни 2019 г., 18:54:54 ч. Гринуич+3, Wesley Stewart 
 написа:  
 
 Is there any way to get ovirt disk performance metrics into the web interface? 
 It would be nice to see some type of IOPs data, so we can see which VMs are 
hitting our data stores the most.
It seems you can run virt-top on a host to get some of these metrics, but it 
would be nice to get some sort of data in the gui.
Thanks!___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PLTFNTEJF26IFTT65XZNRR4MFVDOM4NR/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Shani Leviim
Np :)

On Tuesday, June 11, 2019, Artem Tambovskiy 
wrote:

> Actually, this one wasn't stupid. The host was running version 4.3.3 and
> was upgraded to 4.3.4 after yum update.
> And this solved the issue ... thanks a lot!
>
> Looing at the bugtracker (https://bugzilla.redhat.com/
> buglist.cgi?classification=oVirt&f1=flagtypes.name&o1=
> substring&query_format=advanced&target_milestone=ovirt-4.3.4&v1=blocker)
> I don't see a suitable bug for this. perhaps vdsmd reconfiguration + yum
> upgrade + host reboot did the trick.
>
> Thank you very much for spending a time on this!
> Regards,
> Artem
>
> On Tue, Jun 11, 2019 at 5:19 PM Shani Leviim  wrote:
>
>> A stupid one: did you try to yum update?
>>
>>
>>
>> *Regards,*
>>
>> *Shani Leviim*
>>
>>
>> On Tue, Jun 11, 2019 at 5:11 PM Artem Tambovskiy <
>> artem.tambovs...@gmail.com> wrote:
>>
>>>
>>> Just tried this:
>>>
>>> [root@ovirt1 vdsm]# vdsm-tool configure --force
>>>
>>> Checking configuration status...
>>>
>>> abrt is already configured for vdsm
>>> Managed volume database is already configured
>>> lvm is configured for vdsm
>>> libvirt is already configured for vdsm
>>> SUCCESS: ssl configured to true. No conflicts
>>> Current revision of multipath.conf detected, preserving
>>>
>>> Running configure...
>>> Reconfiguration of abrt is done.
>>> Reconfiguration of passwd is done.
>>> Reconfiguration of libvirt is done.
>>>
>>> Done configuring modules to VDSM.
>>>
>>> And tried to restart vdsmd - it failed again.
>>>
>>> [root@ovirt1 vdsm]# journalctl -xe
>>> Jun 11 17:07:58 ovirt1.telia.ru systemd[1]: ovirt-ha-broker.service
>>> failed.
>>> Jun 11 17:07:58 ovirt1.telia.ru systemd[1]: ovirt-ha-broker.service
>>> holdoff time over, scheduling restart.
>>> Jun 11 17:07:58 ovirt1.telia.ru systemd[1]: Cannot add dependency job
>>> for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
>>> Jun 11 17:07:58 ovirt1.telia.ru systemd[1]: Stopped oVirt Hosted Engine
>>> High Availability Communications Broker.
>>> -- Subject: Unit ovirt-ha-broker.service has finished shutting down
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit ovirt-ha-broker.service has finished shutting down.
>>> Jun 11 17:07:58 ovirt1.telia.ru systemd[1]: Started oVirt Hosted Engine
>>> High Availability Communications Broker.
>>> -- Subject: Unit ovirt-ha-broker.service has finished start-up
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit ovirt-ha-broker.service has finished starting up.
>>> --
>>> -- The start-up result is done.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: mom-vdsm.service holdoff
>>> time over, scheduling restart.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Cannot add dependency job
>>> for unit lvm2-lvmetad.socket, ignoring: Unit is masked.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Stopped MOM instance
>>> configured for VDSM purposes.
>>> -- Subject: Unit mom-vdsm.service has finished shutting down
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit mom-vdsm.service has finished shutting down.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: start request repeated too
>>> quickly for supervdsmd.service
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Failed to start Auxiliary
>>> vdsm service for running helper functions as root.
>>> -- Subject: Unit supervdsmd.service has failed
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit supervdsmd.service has failed.
>>> --
>>> -- The result is failed.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Dependency failed for
>>> Virtual Desktop Server Manager.
>>> -- Subject: Unit vdsmd.service has failed
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit vdsmd.service has failed.
>>> --
>>> -- The result is dependency.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Dependency failed for MOM
>>> instance configured for VDSM purposes.
>>> -- Subject: Unit mom-vdsm.service has failed
>>> -- Defined-By: systemd
>>> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
>>> --
>>> -- Unit mom-vdsm.service has failed.
>>> --
>>> -- The result is dependency.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Job mom-vdsm.service/start
>>> failed with result 'dependency'.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: Job vdsmd.service/start
>>> failed with result 'dependency'.
>>> Jun 11 17:07:59 ovirt1.telia.ru systemd[1]: supervdsmd.service failed.
>>>
>>> I also tried to enable DEBUG level but failed.
>>>
>>> [root@ovirt1 vdsm]# vdsm-client Host setLogLevel level=DEBUG
>>> vdsm-client: Connection to localhost:54321 with use_tls=True, timeout=60
>>> failed: [Errno 111] Connection refused
>>>
>>>
>>> Regards,
>>> Artem
>>>
>>>
>>>
>>> On Tue, Jun 11, 2019 at