[ovirt-users] Re: VM Disk Performance metrics?

2019-06-11 Thread Strahil Nikolov
 +1 vote from me.

Best Regards,Strahil Nikolov
В вторник, 11 юни 2019 г., 18:54:54 ч. Гринуич+3, Wesley Stewart 
 написа:  
 
 Is there any way to get ovirt disk performance metrics into the web interface? 
 It would be nice to see some type of IOPs data, so we can see which VMs are 
hitting our data stores the most.
It seems you can run virt-top on a host to get some of these metrics, but it 
would be nice to get some sort of data in the gui.
Thanks!___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/
  ___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PLTFNTEJF26IFTT65XZNRR4MFVDOM4NR/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Strahil Nikolov
 Do you have empty space to store the VMs ? If yes, you can always script the 
migration of the disks via the API . Even a bash script and curl can do the 
trick.
About the /dev/sdb , I still don't get it . A pure "df -hT" from a node will 
make it way clear. I guess '/dev/sdb' is a PV and you got 2 LVs ontop of it.
Note: I should admit that as an admin - I don't use UI for gluster management.
For now do not try to remove the brick. The approach is either to migrate the 
qemu disks to another storage or to reset-brick/replace-brick in order to 
restore the replica count.I will check the file and I will try to figure it out.
Redeployment never fixes the issue, it just speeds up the recovery. If you can 
afford the time to spent on fixing the issue - then do not redeploy.
I would be able to take a look next week , but keep in mind that I'm not so in 
deep with oVirt - I have started playing with it when I deployed my lab.
Best Regards,Strahil Nikolov 
 Strahil,
  
Looking at yoursuggestions I think I need to provide a bit more info on my 
currentsetup. 



   
   -
I have 9 hosts in total
 
   -
I have 5 storage domains:
   
  -   
hosted_storage (Data Master)
 
  -   
vmstore1 (Data)
 
  -   
data1 (Data)
 
  -   
data2 (Data)
 
  -   
ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not let me 
upload disk images to a data domain without an ISO (I think this is due to a 
bug)  
  
 
 
 
   -
Each volume is of the type “Distributed Replicate” and each one is composed of 
9 bricks.   
I started with 3 bricks per volume due to the initial Hyperconverged setup, 
then I expanded the cluster and the gluster cluster by 3 hosts at a time until 
I got to a total of 9 hosts.

   
   
   -
Disks, bricks and sizes used per volume   
 / dev/sdb engine 100GB   
 / dev/sdb vmstore1 2600GB   
 / dev/sdc data1 2600GB   
 / dev/sdd data2 2600GB   
/ dev/sde  400GB SSD Used for caching purposes   
   
>From the above layout a few questions came up:
   
  -   
Using the web UI, How can I create a 100GB brick and a 2600GB brick to replace 
the bad bricks for “engine” and “vmstore1” within the same block device (sdb) ? 
  
  
What about / dev/sde (caching disk), When I tried creating a new brick thru the 
UI I saw that I could use / dev/sde for caching but only for 1 brick (i.e. 
vmstore1) so if I try to create another brick how would I specify it is the 
same / dev/sde device to be used for caching?
 
 



   
   -
If I want to remove a brick and it being a replica 3, I go to storage > Volumes 
> select the volume > bricks once in there I can select the 3 servers that 
compose the replicated bricks and click remove, this gives a pop-up window with 
the following info:   
   
Are you sure you want to remove the following Brick(s)?   
- vmm11:/gluster_bricks/vmstore1/vmstore1   
- vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1   
- 192.168.0.100:/gluster-bricks/vmstore1/vmstore1   
- Migrate Data from the bricks?   
   
If I proceed with this that means I will have to do this for all the 4 volumes, 
that is just not very efficient, but if that is the only way, then I am 
hesitant to put this into a real production environment as there is no way I 
can take that kind of a hit for +500 vms :) and also I wont have that much 
storage or extra volumes to play with in a real sceneario.   
   
 
 
   -
After modifying yesterday / etc/vdsm/vdsm.id by following 
(https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids) I was 
able to add the server back to the cluster using a new fqdn and a new IP, and 
tested replacing one of the bricks and this is my mistake as mentioned in #3 
above I used / dev/sdb entirely for 1 brick because thru the UI I could not 
separate the block device and be used for 2 bricks (one for the engine and one 
for vmstore1). So in the “gluster vol info” you might see vmm102.mydomain.com 
but in reality it is myhost1.mydomain.com   
   
 
 
   -
I am also attaching gluster_peer_status.txt  and in the last 2 entries of that 
file you will see and entry vmm10.mydomain.com (old/bad entry) and 
vmm102.mydomain.com (new entry, same server vmm10, but renamed to vmm102). Also 
please find gluster_vol_info.txt file.   
   
 
 
   -
I am ready to redeploy this environment if needed, but I am also ready to test 
any other suggestion. If I can get a good understanding on how to recover from 
this I will be ready to move to production.   
   
 
 
   -
Wondering if you’d be willing to have a look at my setup through a shared 
screen?   
   
   
 


Thanks




Adrian

On Mon, Jun 10, 2019 at 11:41 PM Strahil  wrote:


Hi Adrian,

You have several options:
A) If you have space on another gluster volume (or volumes) or on NFS-based 
storage, you can migrate all VMs live . Once you do it,  the simple way will be 
to stop and remove the storage domain (from UI) and gluster volume that 

[ovirt-users] RFE: HostedEngine to use boom by default

2019-06-11 Thread Strahil Nikolov
Hello All,
I have seen a lot of cases where the HostedEngine gets corrupted/broken and 
beyond repair.
I think that BOOM is a good option for our HostedEngine appliances due to the 
fact that it supports booting from LVM snapshots and thus being able to easily 
recover after upgrades or other outstanding situations.
Sadly, BOOM has 1 drawback - that everything should be under a single snapshot 
- thus no separation of /var /log or /audit.
Do you think that changing the appliance layout is worth it ?
Note: I might have an unsupported layout that could cause my confusion.Is your 
layout a single root LV ?
Best Regards,Strahil Nikolov___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5OTOIAI4BXMVRFN5MCDGXNZHYB46XWLF/


[ovirt-users] Re: VM Disk Performance metrics?

2019-06-11 Thread Jayme
Have you looked at installing ovirt metrics store?

On Tue, Jun 11, 2019 at 12:56 PM Wesley Stewart  wrote:

> Is there any way to get ovirt disk performance metrics into the web
> interface?  It would be nice to see some type of IOPs data, so we can see
> which VMs are hitting our data stores the most.
>
> It seems you can run virt-top on a host to get some of these metrics, but
> it would be nice to get some sort of data in the gui.
>
> Thanks!
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JYODX7ILXRM7DZDVCKC5UGWXULVEHJB5/


[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Adrian Quintero
adding gluster pool list:
UUID Hostname State
2c86fa95-67a2-492d-abf0-54da625417f8  vmm12.mydomain.com Connected
ab099e72-0f56-4d33-a16b-ba67d67bdf9d  vmm13.mydomain.com Connected
c35ad74d-1f83-4032-a459-079a27175ee4 vmm14.mydomain.com Connected
aeb7712a-e74e-4492-b6af-9c266d69bfd3  vmm17.mydomain.com Connected
4476d434-d6ff-480f-b3f1-d976f642df9c vmm16.mydomain.com Connected
22ec0c0a-a5fc-431c-9f32-8b17fcd80298   vmm15.mydomain.com Connected
caf84e9f-3e03-4e6f-b0f8-4c5ecec4bef6vmm18.mydomain.com Connected
18385970-aba6-4fd1-85a6-1b13f663e60b  vmm10.mydomain.com * Disconnected
//server that went bad.*
b152fd82-8213-451f-93c6-353e96aa3be9  vmm102.mydomain.com Connected
//vmm10 but with different name
228a9282-c04e-4229-96a6-67cb47629892 localhost
Connected

On Tue, Jun 11, 2019 at 11:24 AM Adrian Quintero 
wrote:

> Strahil,
>
> Looking at your suggestions I think I need to provide a bit more info on
> my current setup.
>
>
>
>1.
>
>I have 9 hosts in total
>2.
>
>I have 5 storage domains:
>-
>
>   hosted_storage (Data Master)
>   -
>
>   vmstore1 (Data)
>   -
>
>   data1 (Data)
>   -
>
>   data2 (Data)
>   -
>
>   ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not
>   let me upload disk images to a data domain without an ISO (I think this 
> is
>   due to a bug)
>
>   3.
>
>Each volume is of the type “Distributed Replicate” and each one is
>composed of 9 bricks.
>I started with 3 bricks per volume due to the initial Hyperconverged
>setup, then I expanded the cluster and the gluster cluster by 3 hosts at a
>time until I got to a total of 9 hosts.
>
>
>-
>
>
>
>
>
>
>
>
> *Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb
>   vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde
>    400GB SSD Used for caching purposes From the above layout a few
>   questions came up:*
>   1.
>
>
>
> *Using the web UI, How can I create a 100GB brick and a 2600GB brick to
>  replace the bad bricks for “engine” and “vmstore1” within the same 
> block
>  device (sdb) ? What about / dev/sde (caching disk), When I tried 
> creating a
>  new brick thru the UI I saw that I could use / dev/sde for caching 
> but only
>  for 1 brick (i.e. vmstore1) so if I try to create another brick how 
> would I
>  specify it is the same / dev/sde device to be used for caching?*
>
>
>
>1.
>
>If I want to remove a brick and it being a replica 3, I go to storage
>> Volumes > select the volume > bricks once in there I can select the 3
>servers that compose the replicated bricks and click remove, this gives a
>pop-up window with the following info:
>
>Are you sure you want to remove the following Brick(s)?
>- vmm11:/gluster_bricks/vmstore1/vmstore1
>- vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1
>- 192.168.0.100:/gluster-bricks/vmstore1/vmstore1
>- Migrate Data from the bricks?
>
>If I proceed with this that means I will have to do this for all the 4
>volumes, that is just not very efficient, but if that is the only way, then
>I am hesitant to put this into a real production environment as there is no
>way I can take that kind of a hit for +500 vms :) and also I wont have
>that much storage or extra volumes to play with in a real sceneario.
>
>2.
>
>After modifying yesterday */ etc/vdsm/vdsm.id  by
>following
>(https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids
>) I
>was able to add the server **back **to the cluster using a new fqdn
>and a new IP, and tested replacing one of the bricks and this is my mistake
>as mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru
>the UI I could not separate the block device and be used for 2 bricks (one
>for the engine and one for vmstore1). **So in the “gluster vol info”
>you might see vmm102.mydomain.com  *
> *but in reality it is myhost1.mydomain.com  *
>3.
>
>*I am also attaching gluster_peer_status.txt * *and in the last 2
>entries of that file you will see and entry vmm10.mydomain.com
> (old/bad entry) and vmm102.mydomain.com
> (new entry, same server vmm10, but renamed to
>vmm102). *
> *Also please find gluster_vol_info.txt file. *
>4.
>
>*I am ready *
> *to redeploy this environment if needed, but I am also ready to test any
>other suggestion. If I can get a good understanding on how to recover from
>this I will be ready to move to production. *
>5.
>
>
>
> *Wondering if you’d be willing to have a look at my setup through a shared
>screen? *
>
> *Thanks *
>
>
> *Adrian*
>
> On Mon, Jun 10, 

[ovirt-users] Re: Replace bad Host from a 9 Node hyperconverged setup 4.3.3

2019-06-11 Thread Adrian Quintero
Strahil,

Looking at your suggestions I think I need to provide a bit more info on my
current setup.



   1.

   I have 9 hosts in total
   2.

   I have 5 storage domains:
   -

  hosted_storage (Data Master)
  -

  vmstore1 (Data)
  -

  data1 (Data)
  -

  data2 (Data)
  -

  ISO (NFS) //had to create this one because oVirt 4.3.3.1 would not
  let me upload disk images to a data domain without an ISO (I
think this is
  due to a bug)

  3.

   Each volume is of the type “Distributed Replicate” and each one is
   composed of 9 bricks.
   I started with 3 bricks per volume due to the initial Hyperconverged
   setup, then I expanded the cluster and the gluster cluster by 3 hosts at a
   time until I got to a total of 9 hosts.


   -








*Disks, bricks and sizes used per volume / dev/sdb engine 100GB / dev/sdb
  vmstore1 2600GB / dev/sdc data1 2600GB / dev/sdd data2 2600GB / dev/sde
   400GB SSD Used for caching purposes From the above layout a few
  questions came up:*
  1.



*Using the web UI, How can I create a 100GB brick and a 2600GB brick to
 replace the bad bricks for “engine” and “vmstore1” within the
same block
 device (sdb) ? What about / dev/sde (caching disk), When I
tried creating a
 new brick thru the UI I saw that I could use / dev/sde for
caching but only
 for 1 brick (i.e. vmstore1) so if I try to create another
brick how would I
 specify it is the same / dev/sde device to be used for caching?*



   1.

   If I want to remove a brick and it being a replica 3, I go to storage >
   Volumes > select the volume > bricks once in there I can select the 3
   servers that compose the replicated bricks and click remove, this gives a
   pop-up window with the following info:

   Are you sure you want to remove the following Brick(s)?
   - vmm11:/gluster_bricks/vmstore1/vmstore1
   - vmm12.virt.iad3p:/gluster_bricks/vmstore1/vmstore1
   - 192.168.0.100:/gluster-bricks/vmstore1/vmstore1
   - Migrate Data from the bricks?

   If I proceed with this that means I will have to do this for all the 4
   volumes, that is just not very efficient, but if that is the only way, then
   I am hesitant to put this into a real production environment as there is no
   way I can take that kind of a hit for +500 vms :) and also I wont have
   that much storage or extra volumes to play with in a real sceneario.

   2.

   After modifying yesterday */ etc/vdsm/vdsm.id  by
   following
   (https://stijn.tintel.eu/blog/2013/03/02/ovirt-problem-duplicate-uuids
   ) I
   was able to add the server **back **to the cluster using a new fqdn and
   a new IP, and tested replacing one of the bricks and this is my mistake as
   mentioned in #3 above I used / dev/sdb entirely for 1 brick because thru
   the UI I could not separate the block device and be used for 2 bricks (one
   for the engine and one for vmstore1). **So in the “gluster vol info” you
   might see vmm102.mydomain.com  *
*but in reality it is myhost1.mydomain.com  *
   3.

   *I am also attaching gluster_peer_status.txt * *and in the last 2
   entries of that file you will see and entry vmm10.mydomain.com
    (old/bad entry) and vmm102.mydomain.com
    (new entry, same server vmm10, but renamed to
   vmm102). *
*Also please find gluster_vol_info.txt file. *
   4.

   *I am ready *
*to redeploy this environment if needed, but I am also ready to test any
   other suggestion. If I can get a good understanding on how to recover from
   this I will be ready to move to production. *
   5.



*Wondering if you’d be willing to have a look at my setup through a shared
   screen? *

*Thanks *


*Adrian*

On Mon, Jun 10, 2019 at 11:41 PM Strahil  wrote:

> Hi Adrian,
>
> You have several options:
> A) If you have space on another gluster volume (or volumes) or on
> NFS-based storage, you can migrate all VMs live . Once you do it,  the
> simple way will be to stop and remove the storage domain (from UI) and
> gluster volume that correspond to the problematic brick. Once gone, you
> can  remove the entry in oVirt for the old host and add the newly built
> one.Then you can recreate your volume and migrate the data back.
>
> B)  If you don't have space you have to use a more riskier approach
> (usually it shouldn't be risky, but I had bad experience in gluster v3):
> - New server has same IP and hostname:
> Use command line and run the 'gluster volume reset-brick VOLNAME
> HOSTNAME:BRICKPATH HOSTNAME:BRICKPATH commit'
> Replace VOLNAME with your volume name.
> A more practical example would be:
> 'gluster volume reset-brick data ovirt3:/gluster_bricks/data/brick
> ovirt3:/gluster_ ricks/data/brick commit'
>
> If it refuses, then you have to cleanup '/gluster_bricks/data' 

[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Andreas Elvers
> probably one of the problems that I
> haven't switched the cluster from iptables to firewalld. But this is just
> my guess.
> 

When switching to from 4.2.8 to 4.3.3 I did not change one host from iptables 
to firewalld as well. I was still able to change it later even if the 
documentation somewhere said iptables support is to be removed in 4.3.  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QMJGVX4LFSMGXKRN3ZS7UOAYMFBWC6XS/


[ovirt-users] VM Disk Performance metrics?

2019-06-11 Thread Wesley Stewart
Is there any way to get ovirt disk performance metrics into the web
interface?  It would be nice to see some type of IOPs data, so we can see
which VMs are hitting our data stores the most.

It seems you can run virt-top on a host to get some of these metrics, but
it would be nice to get some sort of data in the gui.

Thanks!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LMOOJ6JVZYAM74PWYPBCQ4FCNYTCY5KQ/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Artem Tambovskiy
Shani,

supervdsm failing too.

[root@ovirt1 vdsm]# systemctl status supervdsmd
● supervdsmd.service - Auxiliary vdsm service for running helper functions
as root
   Loaded: loaded (/usr/lib/systemd/system/supervdsmd.service; static;
vendor preset: enabled)
   Active: failed (Result: start-limit) since Tue 2019-06-11 16:18:16 MSK;
5s ago
  Process: 176025 ExecStart=/usr/share/vdsm/daemonAdapter
/usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
(code=exited, status=1/FAILURE)
 Main PID: 176025 (code=exited, status=1/FAILURE)

Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered
failed state.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service holdoff time
over, scheduling restart.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Stopped Auxiliary vdsm service
for running helper functions as root.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: start request repeated too
quickly for supervdsmd.service
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Failed to start Auxiliary vdsm
service for running helper functions as root.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: Unit supervdsmd.service entered
failed state.
Jun 11 16:18:16 ovirt1.telia.ru systemd[1]: supervdsmd.service failed.


supervdsm.log is full of messages like
logfile::DEBUG::2019-06-11 16:18:46,379::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:04,401::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:06,289::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:17,535::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:21,528::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:24,541::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:42,543::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:19:57,442::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:18,539::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:32,041::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})
logfile::DEBUG::2019-06-11 16:20:41,051::concurrent::193::root::(run) START
thread  (func=>, args=(), kwargs={})

Regards,
Artem


On Tue, Jun 11, 2019 at 3:59 PM Shani Leviim  wrote:

> +Dan Kenigsberg 
>
> Hi Artem,
> Thanks for the log.
>
> It seems that this error message appears quite a lot:
> 2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to
> supervdsm service failed: [Errno 2] No such file or directory (panic:29)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
> 86, in _connect
> self._manager.connect, Exception, timeout=60, tries=3)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line
> 58, in retry
> return func()
>   File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in
> connect
> conn = Client(self._address, authkey=self._authkey)
>   File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in
> Client
> c = SocketClient(address)
>   File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in
> SocketClient
> s.connect(address)
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
> return getattr(self._sock,name)(*args)
> error: [Errno 2] No such file or directory
>
> Can you please verify that the 'supervdsmd.service' is running?
>
>
> *Regards,*
>
> *Shani Leviim*
>
>
> On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hi Shani,
>>
>> yes, you are right - I can do ssh form aby to any hosts in the cluster.
>> vdsm.log attached.
>> I have tried to restart vdsm manually and even done a host restart
>> several times with no success.
>> Host activation fails all the time ...
>>
>> Thank you in advance for your help!
>> Regard,
>> Artem
>>
>> On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:
>>
>>> Hi Artem,
>>> According to oVirt documentation [1], hosts on the same cluster should
>>> be reachable from one to each other.
>>>
>>> Can you please share your vdsm log?
>>> I suppose you do manage to ssh that inactive host (correct me if I'm
>>> wrong).
>>> While getting the vdsm log, maybe try to restart the network and vdsmd
>>> services on the host.
>>>
>>> Another thing you can try on the UI is putting the host on maintenance
>>> and then activate it.
>>>
>>> [1]
>>> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>>>
>>>
>>> *Regards,*
>>>
>>> *Shani Leviim*
>>>
>>>
>>> On Mon, Jun 10, 2019 at 4:42 PM 

[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Shani Leviim
+Dan Kenigsberg 

Hi Artem,
Thanks for the log.

It seems that this error message appears quite a lot:
2019-06-11 12:10:35,283+0300 ERROR (MainThread) [root] Panic: Connect to
supervdsm service failed: [Errno 2] No such file or directory (panic:29)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
86, in _connect
self._manager.connect, Exception, timeout=60, tries=3)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 58,
in retry
return func()
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 500, in
connect
conn = Client(self._address, authkey=self._authkey)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 173, in
Client
c = SocketClient(address)
  File "/usr/lib64/python2.7/multiprocessing/connection.py", line 308, in
SocketClient
s.connect(address)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
error: [Errno 2] No such file or directory

Can you please verify that the 'supervdsmd.service' is running?


*Regards,*

*Shani Leviim*


On Tue, Jun 11, 2019 at 3:04 PM Artem Tambovskiy 
wrote:

> Hi Shani,
>
> yes, you are right - I can do ssh form aby to any hosts in the cluster.
> vdsm.log attached.
> I have tried to restart vdsm manually and even done a host restart several
> times with no success.
> Host activation fails all the time ...
>
> Thank you in advance for your help!
> Regard,
> Artem
>
> On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:
>
>> Hi Artem,
>> According to oVirt documentation [1], hosts on the same cluster should be
>> reachable from one to each other.
>>
>> Can you please share your vdsm log?
>> I suppose you do manage to ssh that inactive host (correct me if I'm
>> wrong).
>> While getting the vdsm log, maybe try to restart the network and vdsmd
>> services on the host.
>>
>> Another thing you can try on the UI is putting the host on maintenance
>> and then activate it.
>>
>> [1]
>> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>>
>>
>> *Regards,*
>>
>> *Shani Leviim*
>>
>>
>> On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy <
>> artem.tambovs...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> May I ask you for and advise?
>>> I'm running a small oVirt cluster and couple of months ago I decided to
>>> do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time.
>>> I can only guess what I did wrong - probably one of the problems that I
>>> haven't switched the cluster from iptables to firewalld. But this is just
>>> my guess.
>>>
>>> The problem is that I have upgraded the engine and one host, and then I
>>> done an upgrade of second host I can't bring it to active state. Looks like
>>> VDSM can't detect the network and fails to start. I even tried to reinstall
>>> the hosts from UI (I have seen that the packages being installed) but
>>> again, VDSM doesn't startup at the end and reinstallation fails.
>>>
>>> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
>>> forever.
>>>
>>> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>>
>>> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
>>> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
>>> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
>>> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
>>> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>>> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
>>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>>
>>> The all hosts in cluster are reachable from each other ...  That could
>>> be the issue?
>>>
>>> Thank you in advance!
>>> --
>>> Regards,
>>> Artem
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>>>
>>
>
> --
> Regards,
> Artem
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/C6TR46UNW2GWXOA32NT7FOIA4KJDVTSK/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Artem Tambovskiy
Hi Shani,

yes, you are right - I can do ssh form aby to any hosts in the cluster.
vdsm.log attached.
I have tried to restart vdsm manually and even done a host restart several
times with no success.
Host activation fails all the time ...

Thank you in advance for your help!
Regard,
Artem

On Tue, Jun 11, 2019 at 10:51 AM Shani Leviim  wrote:

> Hi Artem,
> According to oVirt documentation [1], hosts on the same cluster should be
> reachable from one to each other.
>
> Can you please share your vdsm log?
> I suppose you do manage to ssh that inactive host (correct me if I'm
> wrong).
> While getting the vdsm log, maybe try to restart the network and vdsmd
> services on the host.
>
> Another thing you can try on the UI is putting the host on maintenance and
> then activate it.
>
> [1]
> https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters
>
>
> *Regards,*
>
> *Shani Leviim*
>
>
> On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy <
> artem.tambovs...@gmail.com> wrote:
>
>> Hello,
>>
>> May I ask you for and advise?
>> I'm running a small oVirt cluster and couple of months ago I decided to
>> do an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time.
>> I can only guess what I did wrong - probably one of the problems that I
>> haven't switched the cluster from iptables to firewalld. But this is just
>> my guess.
>>
>> The problem is that I have upgraded the engine and one host, and then I
>> done an upgrade of second host I can't bring it to active state. Looks like
>> VDSM can't detect the network and fails to start. I even tried to reinstall
>> the hosts from UI (I have seen that the packages being installed) but
>> again, VDSM doesn't startup at the end and reinstallation fails.
>>
>> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
>> forever.
>>
>> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>>
>> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
>> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
>> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
>> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
>> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
>> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>>
>> The all hosts in cluster are reachable from each other ...  That could be
>> the issue?
>>
>> Thank you in advance!
>> --
>> Regards,
>> Artem
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>>
>

-- 
Regards,
Artem


vdsm.tar.bzip2
Description: Binary data
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/U65YDKV4P6IFXENCQOCGNR23KXTM6HHD/


[ovirt-users] Re: Failed to activate Storage Domain --- ovirt 4.2

2019-06-11 Thread Aminur Rahman
Hi Nir

Yes, the metadata was corrupted but the VMs were running OK. This master 
storage domain has increased its allocation significantly overnight and ran out 
the space limit and went to offline completely. The cluster was online and VMs 
were running OK but the affected Storage Domain went offline. I tired increase 
the storage domain but the Ovirt wasn’t allowing to expend the storage.

Due to time constrain, I had restore the storage domain using Compellent 
snapshot. However, we need to prevent this happening again when Master storage 
Domain fill-up with the space. Currently, we have the following parameter set 
in the 5TB storage Domain.

ID: 0e1f2a5d-a548-476c-94bd-3ab3fe239926
Size: 5119 GiB
Available: 2361 GiB
Used: 2758 GiB
Allocated: 3104 GiB
Over Allocation Ratio: 14%
Images: 13
Warning Low Space Indicator: 10% (511 GiB)
Critical Space Action Blocker: 5 GiB

Please kindly advise what action needs to implement, so we can prevent this 
occurs again in the future.

Thanks
Aminur Rahman
aminur.rah...@iongroup.com
t
+44 20 7398 0243
m
+44 7825 780697
iongroup.com

From: Nir Soffer 
Sent: 10 June 2019 22:07
To: David Teigland 
Cc: Aminur Rahman ; users 
Subject: Re: [ovirt-users] Failed to activate Storage Domain --- ovirt 4.2

On Mon, Jun 10, 2019 at 11:22 PM David Teigland 
mailto:teigl...@redhat.com>> wrote:
On Mon, Jun 10, 2019 at 10:59:43PM +0300, Nir Soffer wrote:
> > [root@uk1-ion-ovm-18  pvscan
> >   /dev/mapper/36000d31005697814: Checksum error at offset
> > 4397954425856
> >   Couldn't read volume group metadata from
> > /dev/mapper/36000d31005697814.
> >   Metadata location on /dev/mapper/36000d31005697814 at
> > 4397954425856 has invalid summary for VG.
> >   Failed to read metadata summary from
> > /dev/mapper/36000d31005697814
> >   Failed to scan VG from /dev/mapper/36000d31005697814
>
> This looks like corrupted vg metadata.

Yes, the second metadata area, at the end of the device is corrupted; the
first metadata area is probably ok.  That version of lvm is not able to
continue by just using the one good copy.

Can we copy the first metadata area into the second metadata area?

Last week I pushed out major changes to LVM upstream to be able to handle
and repair most of these cases.  So, one option is to build lvm from the
upstream master branch, and check if that can read and repair this
metadata.

This sound pretty risky for production.

> David, we keep 2 metadata copies on the first PV. Can we use one of the
> copies on the PV to restore the metadata to the least good state?

pvcreate with --restorefile and --uuid, and with the right backup metadata

What would be the right backup metadata?

could probably correct things, but experiment with some temporary PVs
first.

Aminur, can you copy and compress the metadata areas, and shared them somewhere?

To copy the first metadata area, use:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md1 bs=128M count=1 
skip=4096 iflag=skip_bytes

To copy the second metadata area, you need to know the size of the PV. On my 
setup with 100G
PV, I have 800 extents (128M each), and this works:

dd if=/dev/mapper/360014058ccaab4857eb40f393aaf0351 of=md2 bs=128M count=1 
skip=799

gzip md1 md2

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZCD5K4UTMZ3QVS7OC2KWBSXWCHWTXQLV/


[ovirt-users] Re: Can't bring upgraded to 4.3 host back to cluster

2019-06-11 Thread Shani Leviim
Hi Artem,
According to oVirt documentation [1], hosts on the same cluster should be
reachable from one to each other.

Can you please share your vdsm log?
I suppose you do manage to ssh that inactive host (correct me if I'm wrong).
While getting the vdsm log, maybe try to restart the network and vdsmd
services on the host.

Another thing you can try on the UI is putting the host on maintenance and
then activate it.

[1]
https://www.ovirt.org/documentation/admin-guide/chap-Clusters.html#introduction-to-clusters


*Regards,*

*Shani Leviim*


On Mon, Jun 10, 2019 at 4:42 PM Artem Tambovskiy 
wrote:

> Hello,
>
> May I ask you for and advise?
> I'm running a small oVirt cluster and couple of months ago I decided to do
> an upgrade from oVirt 4.2.8 to 4.3 and having an issues since that time. I
> can only guess what I did wrong - probably one of the problems that I
> haven't switched the cluster from iptables to firewalld. But this is just
> my guess.
>
> The problem is that I have upgraded the engine and one host, and then I
> done an upgrade of second host I can't bring it to active state. Looks like
> VDSM can't detect the network and fails to start. I even tried to reinstall
> the hosts from UI (I have seen that the packages being installed) but
> again, VDSM doesn't startup at the end and reinstallation fails.
>
> Looking at hosts process list I see  script *wait_for_ipv4s*  hanging
> forever.
>
> vdsm   8603  1  6 16:26 ?00:00:00 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent
>
> *root   8630  1  0 16:26 ?00:00:00 /bin/sh
> /usr/libexec/vdsm/vdsmd_init_common.sh --pre-startroot   8645   8630  6
> 16:26 ?00:00:00 /usr/bin/python2 /usr/libexec/vdsm/wait_for_ipv4s*
> root   8688  1 30 16:27 ?00:00:00 /usr/bin/python2
> /usr/share/vdsm/supervdsmd --sockfile /var/run/vdsm/svdsm.sock
> vdsm   8715  1  0 16:27 ?00:00:00 /usr/bin/python
> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker
>
> The all hosts in cluster are reachable from each other ...  That could be
> the issue?
>
> Thank you in advance!
> --
> Regards,
> Artem
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQX3LN2TEM4DECKKUMMRCWXTRM6BGIAB/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/K5IHSDIGFOYGU5KUHA7ITP362YOME7OJ/


[ovirt-users] [ANN] oVirt 4.3.4 is now generally available

2019-06-11 Thread Sandro Bonazzola
The oVirt Project is pleased to announce the general availability of oVirt
4.3.4 as of June 11th, 2019.

This update is the fourth in a series of stabilization updates to the 4.3
series.

This release is available now on x86_64 architecture for:

* Red Hat Enterprise Linux 7.6 or later

* CentOS Linux (or similar) 7.6 or later

This release supports Hypervisor Hosts on x86_64 and ppc64le architectures
for:

* Red Hat Enterprise Linux 7.6 or later

* CentOS Linux (or similar) 7.6 or later

* oVirt Node 4.3 (available for x86_64 only)

Due to Fedora 28 being now at end of life this release is missing
experimental tech preview for x86_64 and s390x architectures for Fedora 28.

We are working on Fedora 29 and 30 support and we may re-introduce
experimental support for Fedora in next release.

See the release notes [1] for installation / upgrade instructions and a
list of new features and bugs fixed.

Notes:

- oVirt Appliance is already available

- oVirt Node is already available[2]

- oVirt Windows Guest Tools is already available[2]

oVirt Guest Tools ISO has been updated including new virtio-win-prewhql
drivers collection version 171, providing updated qxl-wddm-dod driver
version 0.19 (see
https://gitlab.freedesktop.org/spice/win32/qxl-wddm-dod/blob/master/Changelog
for more information).

oVirt Node and Appliance have been updated including:

- oVirt 4.3.4: http://www.ovirt.org/release/4.3.4/

- Latest CentOS updates including:

   -

   CEBA-2019:1015 CentOS 7 glibc BugFix Update
   

   -

   CESA-2019:0809 Important CentOS 7 ovmf Security Update
   
   -

   CEBA-2019:0814 CentOS 7 lvm2 BugFix Update
   
   -

   CEBA-2019:0808 CentOS 7 iproute BugFix Update
   
   -

   CESA-2019:1168 Important CentOS 7 kernel Security Update
   
   -

   CEBA-2019:0807 CentOS 7 gcc BugFix Update
   
   -

   CEBA-2019:0820 CentOS 7 systemd BugFix Update
   

   -

   CEBA-2019:0816 CentOS 7 sssd BugFix Update
   

   -

   CESA-2019:1264 Important CentOS 7 libvirt Security Update
   
   -

   CEEA-2019:1210 CentOS 7 microcode_ctl Enhancement Update
   

   -

   CESA-2019:1022 Important CentOS 7 python-jinja2 Security Update
   
   -

   CEEA-2019:0045 CentOS 7 rsync Enhancement Update
   
   -

   CEBA-2019:0826 CentOS 7 scap-security-guide BugFix Update
   

   -

   CEBA-2019:0819 CentOS 7 sos BugFix Update
   


- latest CentOS Virt and Storage SIG updates:

   -

   CESA-2019:1179 CentOS Virt SIG - Errata and Security Advisory 2019:1179
   Important
   Upstream details at : https://access.redhat.com/errata/RHSA-2019:1179
   -

   ansible-2.8.1:
   
https://github.com/ansible/ansible/blob/stable-2.8/changelogs/CHANGELOG-v2.8.rst#v2-8-1
   -

   glusterfs-5.6: https://docs.gluster.org/en/latest/release-notes/5.6/
   -

   cockpit 193: https://cockpit-project.org/blog/cockpit-193.html



Additional Resources:

* Read more about the oVirt 4.3.4 release highlights:
http://www.ovirt.org/release/4.3.4/

* Get more oVirt Project updates on Twitter: https://twitter.com/ovirt

* Check out the latest project news on the oVirt blog:
http://www.ovirt.org/blog/

[1] http://www.ovirt.org/release/4.3.4/
[2] http://resources.ovirt.org/pub/ovirt-4.3/iso/

-- 

Sandro Bonazzola

MANAGER, SOFTWARE ENGINEERING, EMEA R RHV

Red Hat EMEA 

sbona...@redhat.com

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JRLH725C2J32NJ342Y4LCDN3KE7ZUDDK/