[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Johan Bernhardsson
One more thing. Gluster is writing everything in sync so gluster will wueue 
untill the host with the broken drive has acknowledged the write. This 
creating io wait and high load


/Johan

On July 10, 2018 04:21:33 Jim Kusznir  wrote:

Thank you for your help.

After more troubleshooting and host reboots, I accidentally discovered that 
the backing disk on ovirt2 (host) had suffered a failure.  On reboot, the 
raid card refused to see it at all.  It said it had cache waiting to be 
written to disk, and in the end, as it couldn't (wouldn't) see that disk, I 
had no choice but to discard that cache and boot up without the physical 
disk.  Since doing so (and running a gluster volume remove for the affected 
host), things are running like normal, although it appears it corrupted two 
disks (I've now lost 5 VMs to gluster-induced disk failures during poorly 
handled failures).


I don't understand why one bad disk wasn't simply failed, or if one 
underlying process was having such a problem, the other hosts didn't take 
it offline and continue (much like RAID would have done).  Instead, 
everything was broke (including gluster volumes on unaffected disks that 
are fully functional across all hosts) as well as very poor performance of 
affected machine AND no diagnostic reports that would allude to a failing 
hard drive.  Is this expected behavior?


--Jim

On Sun, Jul 8, 2018 at 3:54 AM, Yaniv Kaul  wrote:



On Sat, Jul 7, 2018 at 8:45 AM, Jim Kusznir  wrote:

So, I'm still at a loss...It sounds like its either insufficient ram/swap, 
or insufficient network.  It seems to be neither now.  At this point, it 
appears that gluster is just "broke" and killing my systems for no 
descernable reason.  Here's detals, all from the same system (currently 
running 3 VMs):


[root@ovirt3 ~]# w
22:26:53 up 36 days,  4:34,  1 user,  load average: 42.78, 55.98, 53.31
USER TTY  FROM LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/0192.168.8.90 22:262.00s  0.12s  0.11s w

bwm-ng reports the highest data usage was about 6MB/s during this test (and 
that was combined; I have two different gig networks.  One gluster network 
(primary VM storage) runs on one, the other network handles everything else).


[root@ovirt3 ~]# free -m
totalusedfree  shared  buff/cache   available
Mem:  31996   13236 232  18   18526   18195
Swap: 163831475   14908

top - 22:32:56 up 36 days,  4:41,  1 user,  load average: 17.99, 39.69, 47.66

That is indeed a high load average. How many CPUs do you have, btw?

Tasks: 407 total,   1 running, 405 sleeping,   1 stopped,   0 zombie
%Cpu(s):  8.6 us,  2.1 sy,  0.0 ni, 87.6 id,  1.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32764284 total,   228296 free, 13541952 used, 18994036 buff/cache
KiB Swap: 16777212 total, 15246200 free,  1531012 used. 18643960 avail Mem

Can you check what's swapping here? (a tweak to top output will show that)


PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
30036 qemu  20   0 6872324   5.2g  13532 S 144.6 16.5 216:14.55 
/usr/libexec/qemu-kvm -name guest=BillingWin,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/v+
28501 qemu  20   0 5034968   3.6g  12880 S  16.2 11.7  73:44.99 
/usr/libexec/qemu-kvm -name guest=FusionPBX,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/va+
2694 root  20   0 2169224  12164   3108 S   5.0  0.0   3290:42 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
data.ovirt3.nwfiber.com.gluster-brick2-data -p /var/run/+


This one's certainly taking quite a bit of your CPU usage overall.

14293 root  15  -5  944700  13356   4436 S   4.0  0.0  16:32.15 
/usr/sbin/glusterfs --volfile-server=192.168.8.11 
--volfile-server=192.168.8.12 --volfile-server=192.168.8.13 --+


I'm not sure what the sorting order is, but doesn't look like Gluster is 
taking a lot of memory?


25100 vdsm   0 -20 6747440 107868  12836 S   2.3  0.3  21:35.20 
/usr/bin/python2 /usr/share/vdsm/vdsmd
28971 qemu  20   0 2842592   1.5g  13548 S   1.7  4.7 241:46.49 
/usr/libexec/qemu-kvm -name guest=unifi.palousetech.com,debug-threads=on -S 
-object secret,id=masterKey0,format=+

12095 root  20   0  162276   2836   1868 R   1.3  0.0   0:00.25 top
2708 root  20   0 1906040  12404   3080 S   1.0  0.0   1083:33 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
engine.ovirt3.nwfiber.com.gluster-brick1-engine -p /var/+
28623 qemu  20   0 4749536   1.7g  12896 S   0.7  5.5   4:30.64 
/usr/libexec/qemu-kvm -name guest=billing.nwfiber.com,debug-threads=on -S 
-object secret,id=masterKey0,format=ra+


The VMs I see here and above together account for most? (5.2+3.6+1.5+1.7 = 
12GB) - still plenty of memory left.


10 root  20   0   0  0  0 S   0.3  0.0 215:54.72 [rcu_sched]
1030 sanlock   rt   0  773804  27908   2744 S   0.3  0.1  35:55.61 
/usr/sbin/sanlock daemon
1890 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Johan Bernhardsson
In some cases Linux does not reject the broken sata drive it just gets 
horribly slow. From my experience it is how the drive fails.


It might have shown signs in smart and it might have shown some signs in 
syslog with write errors and drive queue errors


For gluster to notice that the drive is gone the drive needs to be reject 
and marked as failed  in Linux then gluster would have reported it as dead.


This is one reason it's a good practice in gluster to run a brick on a raid 
volume instead of only one drive.


/Johan

On July 10, 2018 04:21:33 Jim Kusznir  wrote:

Thank you for your help.

After more troubleshooting and host reboots, I accidentally discovered that 
the backing disk on ovirt2 (host) had suffered a failure.  On reboot, the 
raid card refused to see it at all.  It said it had cache waiting to be 
written to disk, and in the end, as it couldn't (wouldn't) see that disk, I 
had no choice but to discard that cache and boot up without the physical 
disk.  Since doing so (and running a gluster volume remove for the affected 
host), things are running like normal, although it appears it corrupted two 
disks (I've now lost 5 VMs to gluster-induced disk failures during poorly 
handled failures).


I don't understand why one bad disk wasn't simply failed, or if one 
underlying process was having such a problem, the other hosts didn't take 
it offline and continue (much like RAID would have done).  Instead, 
everything was broke (including gluster volumes on unaffected disks that 
are fully functional across all hosts) as well as very poor performance of 
affected machine AND no diagnostic reports that would allude to a failing 
hard drive.  Is this expected behavior?


--Jim

On Sun, Jul 8, 2018 at 3:54 AM, Yaniv Kaul  wrote:



On Sat, Jul 7, 2018 at 8:45 AM, Jim Kusznir  wrote:

So, I'm still at a loss...It sounds like its either insufficient ram/swap, 
or insufficient network.  It seems to be neither now.  At this point, it 
appears that gluster is just "broke" and killing my systems for no 
descernable reason.  Here's detals, all from the same system (currently 
running 3 VMs):


[root@ovirt3 ~]# w
22:26:53 up 36 days,  4:34,  1 user,  load average: 42.78, 55.98, 53.31
USER TTY  FROM LOGIN@   IDLE   JCPU   PCPU WHAT
root pts/0192.168.8.90 22:262.00s  0.12s  0.11s w

bwm-ng reports the highest data usage was about 6MB/s during this test (and 
that was combined; I have two different gig networks.  One gluster network 
(primary VM storage) runs on one, the other network handles everything else).


[root@ovirt3 ~]# free -m
totalusedfree  shared  buff/cache   available
Mem:  31996   13236 232  18   18526   18195
Swap: 163831475   14908

top - 22:32:56 up 36 days,  4:41,  1 user,  load average: 17.99, 39.69, 47.66

That is indeed a high load average. How many CPUs do you have, btw?

Tasks: 407 total,   1 running, 405 sleeping,   1 stopped,   0 zombie
%Cpu(s):  8.6 us,  2.1 sy,  0.0 ni, 87.6 id,  1.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32764284 total,   228296 free, 13541952 used, 18994036 buff/cache
KiB Swap: 16777212 total, 15246200 free,  1531012 used. 18643960 avail Mem

Can you check what's swapping here? (a tweak to top output will show that)


PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
30036 qemu  20   0 6872324   5.2g  13532 S 144.6 16.5 216:14.55 
/usr/libexec/qemu-kvm -name guest=BillingWin,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/v+
28501 qemu  20   0 5034968   3.6g  12880 S  16.2 11.7  73:44.99 
/usr/libexec/qemu-kvm -name guest=FusionPBX,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/va+
2694 root  20   0 2169224  12164   3108 S   5.0  0.0   3290:42 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
data.ovirt3.nwfiber.com.gluster-brick2-data -p /var/run/+


This one's certainly taking quite a bit of your CPU usage overall.

14293 root  15  -5  944700  13356   4436 S   4.0  0.0  16:32.15 
/usr/sbin/glusterfs --volfile-server=192.168.8.11 
--volfile-server=192.168.8.12 --volfile-server=192.168.8.13 --+


I'm not sure what the sorting order is, but doesn't look like Gluster is 
taking a lot of memory?


25100 vdsm   0 -20 6747440 107868  12836 S   2.3  0.3  21:35.20 
/usr/bin/python2 /usr/share/vdsm/vdsmd
28971 qemu  20   0 2842592   1.5g  13548 S   1.7  4.7 241:46.49 
/usr/libexec/qemu-kvm -name guest=unifi.palousetech.com,debug-threads=on -S 
-object secret,id=masterKey0,format=+

12095 root  20   0  162276   2836   1868 R   1.3  0.0   0:00.25 top
2708 root  20   0 1906040  12404   3080 S   1.0  0.0   1083:33 
/usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id 
engine.ovirt3.nwfiber.com.gluster-brick1-engine -p /var/+
28623 qemu  20   0 4749536   1.7g  12896 S   0.7  5.5   4:30.64 
/usr/libexec/qemu-kvm -name 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Jim Kusznir
Thank you for your help.

After more troubleshooting and host reboots, I accidentally discovered that
the backing disk on ovirt2 (host) had suffered a failure.  On reboot, the
raid card refused to see it at all.  It said it had cache waiting to be
written to disk, and in the end, as it couldn't (wouldn't) see that disk, I
had no choice but to discard that cache and boot up without the physical
disk.  Since doing so (and running a gluster volume remove for the affected
host), things are running like normal, although it appears it corrupted two
disks (I've now lost 5 VMs to gluster-induced disk failures during poorly
handled failures).

I don't understand why one bad disk wasn't simply failed, or if one
underlying process was having such a problem, the other hosts didn't take
it offline and continue (much like RAID would have done).  Instead,
everything was broke (including gluster volumes on unaffected disks that
are fully functional across all hosts) as well as very poor performance of
affected machine AND no diagnostic reports that would allude to a failing
hard drive.  Is this expected behavior?

--Jim

On Sun, Jul 8, 2018 at 3:54 AM, Yaniv Kaul  wrote:

>
>
> On Sat, Jul 7, 2018 at 8:45 AM, Jim Kusznir  wrote:
>
>> So, I'm still at a loss...It sounds like its either insufficient
>> ram/swap, or insufficient network.  It seems to be neither now.  At this
>> point, it appears that gluster is just "broke" and killing my systems for
>> no descernable reason.  Here's detals, all from the same system (currently
>> running 3 VMs):
>>
>> [root@ovirt3 ~]# w
>>  22:26:53 up 36 days,  4:34,  1 user,  load average: 42.78, 55.98, 53.31
>> USER TTY  FROM LOGIN@   IDLE   JCPU   PCPU WHAT
>> root pts/0192.168.8.90 22:262.00s  0.12s  0.11s w
>>
>> bwm-ng reports the highest data usage was about 6MB/s during this test
>> (and that was combined; I have two different gig networks.  One gluster
>> network (primary VM storage) runs on one, the other network handles
>> everything else).
>>
>> [root@ovirt3 ~]# free -m
>>   totalusedfree  shared  buff/cache
>>  available
>> Mem:  31996   13236 232  18   18526
>>  18195
>> Swap: 163831475   14908
>>
>> top - 22:32:56 up 36 days,  4:41,  1 user,  load average: 17.99, 39.69,
>> 47.66
>>
>
> That is indeed a high load average. How many CPUs do you have, btw?
>
>
>> Tasks: 407 total,   1 running, 405 sleeping,   1 stopped,   0 zombie
>> %Cpu(s):  8.6 us,  2.1 sy,  0.0 ni, 87.6 id,  1.6 wa,  0.0 hi,  0.1 si,
>> 0.0 st
>> KiB Mem : 32764284 total,   228296 free, 13541952 used, 18994036
>> buff/cache
>> KiB Swap: 16777212 total, 15246200 free,  1531012 used. 18643960 avail
>> Mem
>>
>
> Can you check what's swapping here? (a tweak to top output will show that)
>
>
>>
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
>> COMMAND
>>
>> 30036 qemu  20   0 6872324   5.2g  13532 S 144.6 16.5 216:14.55
>> /usr/libexec/qemu-kvm -name guest=BillingWin,debug-threads=on -S -object
>> secret,id=masterKey0,format=raw,file=/v+
>> 28501 qemu  20   0 5034968   3.6g  12880 S  16.2 11.7  73:44.99
>> /usr/libexec/qemu-kvm -name guest=FusionPBX,debug-threads=on -S -object
>> secret,id=masterKey0,format=raw,file=/va+
>>  2694 root  20   0 2169224  12164   3108 S   5.0  0.0   3290:42
>> /usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id
>> data.ovirt3.nwfiber.com.gluster-brick2-data -p /var/run/+
>>
>
> This one's certainly taking quite a bit of your CPU usage overall.
>
>
>> 14293 root  15  -5  944700  13356   4436 S   4.0  0.0  16:32.15
>> /usr/sbin/glusterfs --volfile-server=192.168.8.11
>> --volfile-server=192.168.8.12 --volfile-server=192.168.8.13 --+
>>
>
> I'm not sure what the sorting order is, but doesn't look like Gluster is
> taking a lot of memory?
>
>
>> 25100 vdsm   0 -20 6747440 107868  12836 S   2.3  0.3  21:35.20
>> /usr/bin/python2 /usr/share/vdsm/vdsmd
>>
>> 28971 qemu  20   0 2842592   1.5g  13548 S   1.7  4.7 241:46.49
>> /usr/libexec/qemu-kvm -name guest=unifi.palousetech.com,debug-threads=on
>> -S -object secret,id=masterKey0,format=+
>> 12095 root  20   0  162276   2836   1868 R   1.3  0.0   0:00.25 top
>>
>>
>>  2708 root  20   0 1906040  12404   3080 S   1.0  0.0   1083:33
>> /usr/sbin/glusterfsd -s ovirt3.nwfiber.com --volfile-id
>> engine.ovirt3.nwfiber.com.gluster-brick1-engine -p /var/+
>> 28623 qemu  20   0 4749536   1.7g  12896 S   0.7  5.5   4:30.64
>> /usr/libexec/qemu-kvm -name guest=billing.nwfiber.com,debug-threads=on
>> -S -object secret,id=masterKey0,format=ra+
>>
>
> The VMs I see here and above together account for most? (5.2+3.6+1.5+1.7 =
> 12GB) - still plenty of memory left.
>
>
>>10 root  20   0   0  0  0 S   0.3  0.0 215:54.72
>> [rcu_sched]
>>
>>  1030 sanlock   rt   0  773804  27908   2744 S   0.3  0.1  35:55.61
>> /usr/sbin/sanlock daemon
>>
>>  1890 zabbix

[ovirt-users] update cluster 4.1 to 4.2

2018-07-09 Thread p . staniforth
Hello,
  When I try to update my default cluster to 4.2 it fails with
Failed to update Host cluster (User: admin@internal-authz)

I get a lot of errors 

ERROR [org.ovirt.engine.core.bll.UpdateVmCommand] (Transaction Reaper Worker 0) 
[] Transaction rolled-back for command 
'org.ovirt.engine.core.bll.UpdateVmCommand'

Thanks,
 Paul S.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4263JRKGGC6HAMI7MNSYX7T556TUGW4H/


[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Darrell Budic
I encountered this after upgrading clients to 3.12.9 as well. It’s not present 
in 3.12.8 or 3.12.6. I’ve added some data I had to that bug, can produce more 
if needed. Forgot to mention my server cluster is at 3.12.9, and is not showing 
any problems, it’s just the clients.

A test cluster on 3.12.11 also shows it, just slower because it’s got fewer 
clients on it.


> From: Sahina Bose 
> Subject: [ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)
> Date: July 9, 2018 at 10:42:15 AM CDT
> To: Edward Clay; Jim Kusznir
> Cc: users
> 
> see response about bug at 
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/WRYEBOLNHJZGKKJUNF77TJ7WMBS66ZYK/
>  
> 
>  which seems to indicate the referenced bug is fixed at 3.12.2 and higher.
> 
> Could you attach the statedump of the process to the bug 
> https://bugzilla.redhat.com/show_bug.cgi?id=1593826 
>  as requested?
> 
> 
> 
> On Mon, Jul 9, 2018 at 8:38 PM, Edward Clay  > wrote:
> Just to add my .02 here.  I've opened a bug on this issue where HV/host 
> connected to clusterfs volumes are running out of ram.  This seemed to be a 
> bug fixed in gluster 3.13 but that patch doesn't seem to be avaiable any 
> longer and 3.12 is what ovirt is using.  For example I have a host that was 
> showing 72% of memory consumption with 3 VMs running on it.  If I migrate 
> those VMs to another Host memory consumption drops to 52%.  If i put this 
> host into maintenance and then activate it it drops down to 2% or so.  Since 
> I ran into this issue I've been manually watching memory consumption on each 
> host and migrating VMs from it to others to keep things from dying.  I'm 
> hoping with the announcement of gluster 3.12 end of life and the move to 
> gluster 4.1 that this will get fixed or that the patch from 3.13 can get 
> backported so this problem will go away.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1593826 
> 
> 
> On 07/07/2018 11:49 AM, Jim Kusznir wrote:
>> **Security Notice - This external email is NOT from The Hut Group** 
>> 
>> This host has NO VMs running on it, only 3 running cluster-wide (including 
>> the engine, which is on its own storage):
>> 
>> top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33, 13.39
>> Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
>> %Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,  0.0 
>> st
>> KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728 buff/cache
>> KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail Mem 
>> 
>>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND  
>>  
>>
>> 13279 root  20   0 2380708  37628   4396 S  51.7  0.1   3768:03 
>> glusterfsd   
>>  
>> 13273 root  20   0 2233212  20460   4380 S  17.2  0.1 105:50.44 
>> glusterfsd   
>> 
>> 13287 root  20   0 2233212  20608   4340 S   4.3  0.1  34:27.20 
>> glusterfsd   
>> 
>> 16205 vdsm   0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69 vdsmd
>>  
>>
>> 16300 vdsm  20   0  608488  25096   5404 S   1.3  0.1   0:05.78 python   
>>  
>>
>>  1109 vdsm  20   0 3127696  44228   8552 S   0.7  0.1  18:49.76 
>> ovirt-ha-broker  
>>  
>> 2 root  20   0   0  0  0 S   0.7  0.0   0:00.13 
>> kworker/u64:3
>>  
>>10 root  20   0   0  0  0 S   0.3  0.0   4:22.36 
>> rcu_sched
>>  
>>   572 root   0 -20   0  0  0 S   0.3  0.0   0:12.02 
>> kworker/1:1H 
>>  
>>   797 root  20   0   0  0  0 S   0.3  0.0   1:59.59 
>> kdmwork-253:2
>>  
>>   877 root   0 -20   0  0  0 S   0.3  0.0 

[ovirt-users] Re: Device Mapper Timeout when using Gluster Snapshots

2018-07-09 Thread Hesham Ahmed
On Mon, Jul 9, 2018 at 3:52 PM Sahina Bose  wrote:
>
>
>
> On Mon, Jul 9, 2018 at 5:41 PM, Hesham Ahmed  wrote:
>>
>> Thanks Sahina for the update,
>>
>> I am using gluster geo-replication for DR in a different installation,
>> however I was not aware that Gluster snapshots are not recommended in
>> a hyperconverged setup, don't. A warning on the Gluster snapshot UI
>> would be helpful. Is gluster volume snapshots for volumes hosting VM
>> images a work in progress with a bug tracker or it's something not
>> expected to change?
>
>
> Agreed on the warning - can you log a bz?
>
> There's no specific bz tracking support for volume snapshots w.r.t VM store 
> use case. If you have a specific scenario where the geo-rep based DR is not 
> sufficient, please log a bug.
>
>> On Mon, Jul 9, 2018 at 2:58 PM Sahina Bose  wrote:
>> >
>> >
>> >
>> > On Sun, Jul 8, 2018 at 3:29 PM, Hesham Ahmed  wrote:
>> >>
>> >> I also noticed that Gluster Snapshots have the SAME UUID as the main
>> >> LV and if using UUID in fstab, the snapshot device is sometimes
>> >> mounted instead of the primary LV
>> >>
>> >> For instance:
>> >> /etc/fstab contains the following line:
>> >>
>> >> UUID=a0b85d33-7150-448a-9a70-6391750b90ad /gluster_bricks/gv01_data01
>> >> auto 
>> >> inode64,noatime,nodiratime,x-parent=dMeNGb-34lY-wFVL-WF42-hlpE-TteI-lMhvvt
>> >> 0 0
>> >>
>> >> # lvdisplay gluster00/lv01_data01
>> >>   --- Logical volume ---
>> >>   LV Path/dev/gluster00/lv01_data01
>> >>   LV Namelv01_data01
>> >>   VG Namegluster00
>> >>
>> >> # mount
>> >> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0 on
>> >> /gluster_bricks/gv01_data01 type xfs
>> >> (rw,noatime,nodiratime,seclabel,attr2,inode64,sunit=1024,swidth=2048,noquota)
>> >>
>> >> Notice above the device mounted at the brick mountpoint is not
>> >> /dev/gluster00/lv01_data01 and instead is one of the snapshot devices
>> >> of that LV
>> >>
>> >> # blkid
>> >> /dev/mapper/gluster00-lv01_shaker_com_sa:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-4ca8eef409ec4932828279efb91339de_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-59992b6c14644f13b5531a054d2aa75c_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-362b50c994b04284b1664b2e2eb49d09_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-0b3cc414f4cb4cddb6e81f162cdb7efe_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-da98ce5efda549039cf45a18e4eacbaf_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >> /dev/mapper/gluster00-4ea5cce4be704dd7b29986ae6698a666_0:
>> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> >>
>> >> Notice the UUID of LV and its snapshots is the same causing systemd to
>> >> mount one of the snapshot devices instead of LV which results in the
>> >> following gluster error:
>> >>
>> >> gluster> volume start gv01_data01 force
>> >> volume start: gv01_data01: failed: Volume id mismatch for brick
>> >> vhost03:/gluster_bricks/gv01_data01/gv. Expected volume id
>> >> be6bc69b-c6ed-4329-b300-3b9044f375e1, volume id
>> >> 55e97e74-12bf-48db-99bb-389bb708edb8 found
>> >
>> >
>> >
>> > We do not recommend gluster volume snapshots for volumes hosting VM 
>> > images. Please look at the 
>> > https://ovirt.org/develop/release-management/features/gluster/gluster-dr/ 
>> > as an alternative.
>> >
>> >>
>> >> On Sun, Jul 8, 2018 at 12:32 PM  wrote:
>> >> >
>> >> > I am facing this trouble since version 4.1 up to the latest 4.2.4, once
>> >> > we enable gluster snapshots and accumulate some snapshots (as few as 15
>> >> > snapshots per server) we start having trouble booting the server. The
>> >> > server enters emergency shell upon boot after timing out waiting for
>> >> > snapshot devices. Waiting a few minutes and pressing Control-D then 
>> >> > boots the server normally. In case of very large number of snapshots 
>> >> > (600+) it can take days before the sever will boot. Attaching journal
>> >> > log, let me know if you need any other logs.
>> >> >
>> >> > Details of the setup:
>> >> >
>> >> > 3 node hyperconverged oVirt setup (64GB RAM, 8-Core E5 Xeon)
>> >> > oVirt 4.2.4
>> >> > oVirt Node 4.2.4
>> >> > 10Gb Interface
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Hesham S. Ahmed
>> >
>> >
>
>

Bug report created for the warining
https://bugzilla.redhat.com/show_bug.cgi?id=1599365

We were not using gluster snapshots for DR rather as a quick way to go
back in time (although we never planned how to use the snapshots).
Maybe scheduling ability should be added for VM snapshots as well.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Sahina Bose
see response about bug at
https://lists.ovirt.org/archives/list/users@ovirt.org/thread/WRYEBOLNHJZGKKJUNF77TJ7WMBS66ZYK/
which seems to indicate the referenced bug is fixed at 3.12.2 and higher.

Could you attach the statedump of the process to the bug
https://bugzilla.redhat.com/show_bug.cgi?id=1593826 as requested?



On Mon, Jul 9, 2018 at 8:38 PM, Edward Clay 
wrote:

> Just to add my .02 here.  I've opened a bug on this issue where HV/host
> connected to clusterfs volumes are running out of ram.  This seemed to be a
> bug fixed in gluster 3.13 but that patch doesn't seem to be avaiable any
> longer and 3.12 is what ovirt is using.  For example I have a host that was
> showing 72% of memory consumption with 3 VMs running on it.  If I migrate
> those VMs to another Host memory consumption drops to 52%.  If i put this
> host into maintenance and then activate it it drops down to 2% or so.
> Since I ran into this issue I've been manually watching memory consumption
> on each host and migrating VMs from it to others to keep things from
> dying.  I'm hoping with the announcement of gluster 3.12 end of life and
> the move to gluster 4.1 that this will get fixed or that the patch from
> 3.13 can get backported so this problem will go away.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1593826
>
> On 07/07/2018 11:49 AM, Jim Kusznir wrote:
>
> **Security Notice - This external email is NOT from The Hut Group**
>
> This host has NO VMs running on it, only 3 running cluster-wide (including
> the engine, which is on its own storage):
>
> top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33, 13.39
> Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
> %Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,
> 0.0 st
> KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728 buff/cache
> KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail Mem
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>
> 13279 root  20   0 2380708  37628   4396 S  51.7  0.1   3768:03
> glusterfsd
>
> 13273 root  20   0 2233212  20460   4380 S  17.2  0.1 105:50.44
> glusterfsd
>
> 13287 root  20   0 2233212  20608   4340 S   4.3  0.1  34:27.20
> glusterfsd
>
> 16205 vdsm   0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69 vdsmd
>
>
> 16300 vdsm  20   0  608488  25096   5404 S   1.3  0.1   0:05.78
> python
>
>  1109 vdsm  20   0 3127696  44228   8552 S   0.7  0.1  18:49.76
> ovirt-ha-broker
>
> 2 root  20   0   0  0  0 S   0.7  0.0   0:00.13
> kworker/u64:3
>
>10 root  20   0   0  0  0 S   0.3  0.0   4:22.36
> rcu_sched
>
>   572 root   0 -20   0  0  0 S   0.3  0.0   0:12.02
> kworker/1:1H
>
>   797 root  20   0   0  0  0 S   0.3  0.0   1:59.59
> kdmwork-253:2
>
>   877 root   0 -20   0  0  0 S   0.3  0.0   0:11.34
> kworker/3:1H
>
>  1028 root  20   0   0  0  0 S   0.3  0.0   0:35.35
> xfsaild/dm-10
>
>  1869 root  20   0 1496472  10540   6564 S   0.3  0.0   2:15.46
> python
>
>  3747 root  20   0   0  0  0 D   0.3  0.0   0:01.21
> kworker/u64:1
>
> 10979 root  15  -5  723504  15644   3920 S   0.3  0.0  22:46.27
> glusterfs
>
> 15085 root  20   0  680884  10792   4328 S   0.3  0.0   0:01.13
> glusterd
>
> 16102 root  15  -5 1204216  44948  11160 S   0.3  0.1   0:18.61
> supervdsmd
>
> At the moment, the engine is barely usable, my other VMs appear to be
> unresponsive.  Two on one host, one on another, and none on the third.
>
>
>
> On Sat, Jul 7, 2018 at 10:38 AM, Jim Kusznir  wrote:
>
>> I run 4-7 VMs, and most of them are 2GB ram.  I have 2 VMs with 4GB.
>>
>> Ram hasn't been an issue until recent ovirt/gluster upgrades.  Storage
>> has always been slow, especially with these drives.  However, even watching
>> network utilization on my switch, the gig-e links never max out.
>>
>> The loadavg issues and unresponsive behavior started with yesterday's
>> ovirt updates.  I now have one VM with low I/O that lives on a separate
>> storage volume (data, fully SSD backed instead of data-hdd, which was
>> having the issues).  I moved it to a ovirt host with no other VMs on it,
>> and that had reshly been rebooted.  Before it had this one VM on it,
>> loadavg was >0.5.  Now its up in the 20's, with only one low Disk I/O, 4GB
>> ram VM on the host.
>>
>> This to me says there's now a new problem separate from Gluster.  I don't
>> have any non-gluster storage available to test with.  I did notice that the
>> last update included a new kernel, and it appears its the qemu-kvm
>> processes that are consuming way more CPU than they used to now.
>>
>> Are there any known issues?  I'm going to reboot into my previous kernel
>> to see if its kernel-caused.
>>
>> --Jim
>>
>>
>>
>> On Fri, Jul 6, 2018 at 11:07 PM, Johan Bernhardsson 
>> wrote:
>>
>>> That is a single sata drive that is slow 

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

2018-07-09 Thread Edward Clay

Just to add my .02 here.  I've opened a bug on this issue where HV/host 
connected to clusterfs volumes are running out of ram.  This seemed to be a bug 
fixed in gluster 3.13 but that patch doesn't seem to be avaiable any longer and 
3.12 is what ovirt is using.  For example I have a host that was showing 72% of 
memory consumption with 3 VMs running on it.  If I migrate those VMs to another 
Host memory consumption drops to 52%.  If i put this host into maintenance and 
then activate it it drops down to 2% or so.  Since I ran into this issue I've 
been manually watching memory consumption on each host and migrating VMs from 
it to others to keep things from dying.  I'm hoping with the announcement of 
gluster 3.12 end of life and the move to gluster 4.1 that this will get fixed 
or that the patch from 3.13 can get backported so this problem will go away.

https://bugzilla.redhat.com/show_bug.cgi?id=1593826

On 07/07/2018 11:49 AM, Jim Kusznir wrote:
**Security Notice - This external email is NOT from The Hut Group**

This host has NO VMs running on it, only 3 running cluster-wide (including the 
engine, which is on its own storage):

top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33, 13.39
Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
%Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728 buff/cache
KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail Mem

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
13279 root  20   0 2380708  37628   4396 S  51.7  0.1   3768:03 glusterfsd
13273 root  20   0 2233212  20460   4380 S  17.2  0.1 105:50.44 glusterfsd
13287 root  20   0 2233212  20608   4340 S   4.3  0.1  34:27.20 glusterfsd
16205 vdsm   0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69 vdsmd
16300 vdsm  20   0  608488  25096   5404 S   1.3  0.1   0:05.78 python
1109 vdsm  20   0 3127696  44228   8552 S   0.7  0.1  18:49.76 
ovirt-ha-broker
2 root  20   0   0  0  0 S   0.7  0.0   0:00.13 
kworker/u64:3
  10 root  20   0   0  0  0 S   0.3  0.0   4:22.36 rcu_sched
 572 root   0 -20   0  0  0 S   0.3  0.0   0:12.02 kworker/1:1H
 797 root  20   0   0  0  0 S   0.3  0.0   1:59.59 kdmwork-253:2
 877 root   0 -20   0  0  0 S   0.3  0.0   0:11.34 kworker/3:1H
1028 root  20   0   0  0  0 S   0.3  0.0   0:35.35 xfsaild/dm-10
1869 root  20   0 1496472  10540   6564 S   0.3  0.0   2:15.46 python
3747 root  20   0   0  0  0 D   0.3  0.0   0:01.21 kworker/u64:1
10979 root  15  -5  723504  15644   3920 S   0.3  0.0  22:46.27 glusterfs
15085 root  20   0  680884  10792   4328 S   0.3  0.0   0:01.13 glusterd
16102 root  15  -5 1204216  44948  11160 S   0.3  0.1   0:18.61 supervdsmd

At the moment, the engine is barely usable, my other VMs appear to be 
unresponsive.  Two on one host, one on another, and none on the third.



On Sat, Jul 7, 2018 at 10:38 AM, Jim Kusznir 
mailto:j...@palousetech.com>> wrote:
I run 4-7 VMs, and most of them are 2GB ram.  I have 2 VMs with 4GB.

Ram hasn't been an issue until recent ovirt/gluster upgrades.  Storage has 
always been slow, especially with these drives.  However, even watching network 
utilization on my switch, the gig-e links never max out.

The loadavg issues and unresponsive behavior started with yesterday's ovirt 
updates.  I now have one VM with low I/O that lives on a separate storage volume 
(data, fully SSD backed instead of data-hdd, which was having the issues).  I 
moved it to a ovirt host with no other VMs on it, and that had reshly been 
rebooted.  Before it had this one VM on it, loadavg was >0.5.  Now its up in 
the 20's, with only one low Disk I/O, 4GB ram VM on the host.

This to me says there's now a new problem separate from Gluster.  I don't have 
any non-gluster storage available to test with.  I did notice that the last 
update included a new kernel, and it appears its the qemu-kvm processes that 
are consuming way more CPU than they used to now.

Are there any known issues?  I'm going to reboot into my previous kernel to see 
if its kernel-caused.

--Jim



On Fri, Jul 6, 2018 at 11:07 PM, Johan Bernhardsson 
mailto:jo...@kafit.se>> wrote:
That is a single sata drive that is slow on random I/O and that has to be 
synced with 2 other servers. Gluster works syncronous so one write has to be 
written and acknowledged on all the three nodes.

So you have a bottle neck in io on drives and one on network and depending on 
how many virtual servers you have and how much ram they take you might have 
memory.

Load spikes when you have a wait somewhere and are overusing capacity. But it's 
now only CPU that load is counted on. It is waiting for resources so it can be 
memory or Network or drives.

How many virtual server do you run and 

[ovirt-users] Re: hyperconverged cluster - how to change the mount path?

2018-07-09 Thread Sahina Bose
On Fri, Jul 6, 2018 at 1:55 PM, Liebe, André-Sebastian <
andre.li...@gematik.de> wrote:

> > To edit the entry of storage connection in database, you can use
> > hosted-engine --set-shared-config mnt_options backup-volfile-servers=<
> server2>:
>
> I tried editing shared configuration but it failed.
> 
> # hosted-engine --get-shared-config mnt_options --type=vm
> Invalid configuration key mnt_options.
> Available keys are:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/get_shared_config.py",
> line 71, in 
> value_and_type = get_shared_config.get_shared_config(*sys.argv)
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/get_shared_config.py",
> line 55, in get_shared_config
> for c_type in config_keys_for_type:
> TypeError: 'NoneType' object is not iterable
>
> # hosted-engine --get-shared-config mnt_options --type=broker
> Invalid configuration key mnt_options.
> Available keys are:
> broker : ['email.smtp-server', 'email.smtp-port', 'email.source-email',
> 'email.destination-emails', 'notify.state_transition']
>
> # hosted-engine --set-shared-config mnt_options
> backup-volfile-servers=lvh2:lvh4
> Duplicate key mnt_options, please specify the key type
> #  hosted-engine --set-shared-config mnt_options
> backup-volfile-servers=lvh2:lvh4 --type=broker
> Invalid configuration key mnt_options.
> Available keys are:
> broker : ['email.smtp-server', 'email.smtp-port', 'email.source-email',
> 'email.destination-emails', 'notify.state_transition']
>
> # hosted-engine --set-shared-config mnt_options
> backup-volfile-servers=lvh2.lab.gematik.de:lvh4.lab.gematik.de --type=vm
> Invalid configuration key mnt_options.
> Available keys are:
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_shared_config.py",
> line 71, in 
> if not set_shared_config.set_shared_config(*sys.argv):
>   File 
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/set_shared_config.py",
> line 54, in set_shared_config
> for c_type in config_keys_for_type:
> TypeError: 'NoneType' object is not iterable
>


Can you try

#  hosted-engine --get-shared-config mnt_options --type=he_shared



>
>
> André
>
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AZIV2L6KUVW76C5PPE46HVNTGEKFR5YK/


[ovirt-users] Re: Device Mapper Timeout when using Gluster Snapshots

2018-07-09 Thread Sahina Bose
On Mon, Jul 9, 2018 at 5:41 PM, Hesham Ahmed  wrote:

> Thanks Sahina for the update,
>
> I am using gluster geo-replication for DR in a different installation,
> however I was not aware that Gluster snapshots are not recommended in
> a hyperconverged setup, don't. A warning on the Gluster snapshot UI
> would be helpful. Is gluster volume snapshots for volumes hosting VM
> images a work in progress with a bug tracker or it's something not
> expected to change?
>

Agreed on the warning - can you log a bz?

There's no specific bz tracking support for volume snapshots w.r.t VM store
use case. If you have a specific scenario where the geo-rep based DR is not
sufficient, please log a bug.

On Mon, Jul 9, 2018 at 2:58 PM Sahina Bose  wrote:
> >
> >
> >
> > On Sun, Jul 8, 2018 at 3:29 PM, Hesham Ahmed  wrote:
> >>
> >> I also noticed that Gluster Snapshots have the SAME UUID as the main
> >> LV and if using UUID in fstab, the snapshot device is sometimes
> >> mounted instead of the primary LV
> >>
> >> For instance:
> >> /etc/fstab contains the following line:
> >>
> >> UUID=a0b85d33-7150-448a-9a70-6391750b90ad /gluster_bricks/gv01_data01
> >> auto inode64,noatime,nodiratime,x-parent=dMeNGb-34lY-wFVL-WF42-
> hlpE-TteI-lMhvvt
> >> 0 0
> >>
> >> # lvdisplay gluster00/lv01_data01
> >>   --- Logical volume ---
> >>   LV Path/dev/gluster00/lv01_data01
> >>   LV Namelv01_data01
> >>   VG Namegluster00
> >>
> >> # mount
> >> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0 on
> >> /gluster_bricks/gv01_data01 type xfs
> >> (rw,noatime,nodiratime,seclabel,attr2,inode64,sunit=
> 1024,swidth=2048,noquota)
> >>
> >> Notice above the device mounted at the brick mountpoint is not
> >> /dev/gluster00/lv01_data01 and instead is one of the snapshot devices
> >> of that LV
> >>
> >> # blkid
> >> /dev/mapper/gluster00-lv01_shaker_com_sa:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-4ca8eef409ec4932828279efb91339de_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-59992b6c14644f13b5531a054d2aa75c_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-362b50c994b04284b1664b2e2eb49d09_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-0b3cc414f4cb4cddb6e81f162cdb7efe_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-da98ce5efda549039cf45a18e4eacbaf_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >> /dev/mapper/gluster00-4ea5cce4be704dd7b29986ae6698a666_0:
> >> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> >>
> >> Notice the UUID of LV and its snapshots is the same causing systemd to
> >> mount one of the snapshot devices instead of LV which results in the
> >> following gluster error:
> >>
> >> gluster> volume start gv01_data01 force
> >> volume start: gv01_data01: failed: Volume id mismatch for brick
> >> vhost03:/gluster_bricks/gv01_data01/gv. Expected volume id
> >> be6bc69b-c6ed-4329-b300-3b9044f375e1, volume id
> >> 55e97e74-12bf-48db-99bb-389bb708edb8 found
> >
> >
> >
> > We do not recommend gluster volume snapshots for volumes hosting VM
> images. Please look at the https://ovirt.org/develop/
> release-management/features/gluster/gluster-dr/ as an alternative.
> >
> >>
> >> On Sun, Jul 8, 2018 at 12:32 PM  wrote:
> >> >
> >> > I am facing this trouble since version 4.1 up to the latest 4.2.4,
> once
> >> > we enable gluster snapshots and accumulate some snapshots (as few as
> 15
> >> > snapshots per server) we start having trouble booting the server. The
> >> > server enters emergency shell upon boot after timing out waiting for
> >> > snapshot devices. Waiting a few minutes and pressing Control-D then
> boots the server normally. In case of very large number of snapshots (600+)
> it can take days before the sever will boot. Attaching journal
> >> > log, let me know if you need any other logs.
> >> >
> >> > Details of the setup:
> >> >
> >> > 3 node hyperconverged oVirt setup (64GB RAM, 8-Core E5 Xeon)
> >> > oVirt 4.2.4
> >> > oVirt Node 4.2.4
> >> > 10Gb Interface
> >> >
> >> > Thanks,
> >> >
> >> > Hesham S. Ahmed
> >
> >
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/552EB5MZDYLBGKY52ZO2Q73OQMUY2DH7/


[ovirt-users] Re: Device Mapper Timeout when using Gluster Snapshots

2018-07-09 Thread Hesham Ahmed
Thanks Sahina for the update,

I am using gluster geo-replication for DR in a different installation,
however I was not aware that Gluster snapshots are not recommended in
a hyperconverged setup, don't. A warning on the Gluster snapshot UI
would be helpful. Is gluster volume snapshots for volumes hosting VM
images a work in progress with a bug tracker or it's something not
expected to change?
On Mon, Jul 9, 2018 at 2:58 PM Sahina Bose  wrote:
>
>
>
> On Sun, Jul 8, 2018 at 3:29 PM, Hesham Ahmed  wrote:
>>
>> I also noticed that Gluster Snapshots have the SAME UUID as the main
>> LV and if using UUID in fstab, the snapshot device is sometimes
>> mounted instead of the primary LV
>>
>> For instance:
>> /etc/fstab contains the following line:
>>
>> UUID=a0b85d33-7150-448a-9a70-6391750b90ad /gluster_bricks/gv01_data01
>> auto 
>> inode64,noatime,nodiratime,x-parent=dMeNGb-34lY-wFVL-WF42-hlpE-TteI-lMhvvt
>> 0 0
>>
>> # lvdisplay gluster00/lv01_data01
>>   --- Logical volume ---
>>   LV Path/dev/gluster00/lv01_data01
>>   LV Namelv01_data01
>>   VG Namegluster00
>>
>> # mount
>> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0 on
>> /gluster_bricks/gv01_data01 type xfs
>> (rw,noatime,nodiratime,seclabel,attr2,inode64,sunit=1024,swidth=2048,noquota)
>>
>> Notice above the device mounted at the brick mountpoint is not
>> /dev/gluster00/lv01_data01 and instead is one of the snapshot devices
>> of that LV
>>
>> # blkid
>> /dev/mapper/gluster00-lv01_shaker_com_sa:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-4ca8eef409ec4932828279efb91339de_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-59992b6c14644f13b5531a054d2aa75c_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-362b50c994b04284b1664b2e2eb49d09_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-0b3cc414f4cb4cddb6e81f162cdb7efe_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-da98ce5efda549039cf45a18e4eacbaf_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>> /dev/mapper/gluster00-4ea5cce4be704dd7b29986ae6698a666_0:
>> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>>
>> Notice the UUID of LV and its snapshots is the same causing systemd to
>> mount one of the snapshot devices instead of LV which results in the
>> following gluster error:
>>
>> gluster> volume start gv01_data01 force
>> volume start: gv01_data01: failed: Volume id mismatch for brick
>> vhost03:/gluster_bricks/gv01_data01/gv. Expected volume id
>> be6bc69b-c6ed-4329-b300-3b9044f375e1, volume id
>> 55e97e74-12bf-48db-99bb-389bb708edb8 found
>
>
>
> We do not recommend gluster volume snapshots for volumes hosting VM images. 
> Please look at the 
> https://ovirt.org/develop/release-management/features/gluster/gluster-dr/ as 
> an alternative.
>
>>
>> On Sun, Jul 8, 2018 at 12:32 PM  wrote:
>> >
>> > I am facing this trouble since version 4.1 up to the latest 4.2.4, once
>> > we enable gluster snapshots and accumulate some snapshots (as few as 15
>> > snapshots per server) we start having trouble booting the server. The
>> > server enters emergency shell upon boot after timing out waiting for
>> > snapshot devices. Waiting a few minutes and pressing Control-D then boots 
>> > the server normally. In case of very large number of snapshots (600+) it 
>> > can take days before the sever will boot. Attaching journal
>> > log, let me know if you need any other logs.
>> >
>> > Details of the setup:
>> >
>> > 3 node hyperconverged oVirt setup (64GB RAM, 8-Core E5 Xeon)
>> > oVirt 4.2.4
>> > oVirt Node 4.2.4
>> > 10Gb Interface
>> >
>> > Thanks,
>> >
>> > Hesham S. Ahmed
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X6TORNPB4XNEHGJJRZFYMOXENDQ2WT7S/


[ovirt-users] Re: Advice on deploying oVirt hyperconverged environment Node based

2018-07-09 Thread Sahina Bose
On Fri, Jul 6, 2018 at 12:17 PM, Tal Bar-Or  wrote:

> Hello All,
> I am about deploying a new Ovirt system for our developers that we plan to
> be hyperconverged environment Node based.
>
> The system workload would be mostly used for builders that compiling our
> code , which involves with lots of small files and intensive IO.
> I plan to build two glustered volume "layers" one based on sas drives for
> OS spin on, and second for Nvme based for intensive IO.
> I would expect that the system will be resilient/high availability and in
> the same time give enough  good IO request for vm builders that will be at
> least 6 to 8 vm guests.
> The system hardware would be as follows:
> *chassis*: 4x HP DL380 gen8
> *each server hardware:*
> *cpu*: 2x e5-2690v2
> *memory*:256GB
> *Disks*:12x 1.2TB sas 10k disks , 2 mirror for os (or using usk kingstone
> 2x 128gb mirror) rest for vm os volume.
> *Nvme*: 2x960GB Kingstone KC1000  for builders compiling source code
> *Network: *4 ports  Intel 10Gbit/s SFP +
>
> Given above configuration and theory,  my question would be what would be
> best practice in terms of Gluster configuration 
> *Distributed,Replicated,Distributed
> Replicated,Dispersed,Distributed Dispersed*?
> What is the suggestion for hardware raid type 5 or 6 , or use ZFS?
>

Replicated or distributed-replicated with replica 3 or replica 3 with
arbiter. Plain distribute will not provide HA for the data. Dispersed
volume types are not fully integrated and supported with oVirt

Network nodes communication , i intend to use 3 ports for storage
> communication and one port for guests network , my question regarding
> Gluster inter communication , what is better would i gail from 3x 10G LACP
> or 1 network for each gluster volume?
>

3x10G - since we do not really have a way to separate networks by each
gluster volume.


> Please advice
> Thanks
>
> --
> Tal Bar-or
>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: https://www.ovirt.org/community/about/community-
> guidelines/
> List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/
> message/4XKEEDT2HHVDQU7FZCANZ26UOMFJTBE5/
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AM4MV464BXFORPX5SIC5LXJFIMCLYRDH/


[ovirt-users] Re: Device Mapper Timeout when using Gluster Snapshots

2018-07-09 Thread Sahina Bose
On Sun, Jul 8, 2018 at 3:29 PM, Hesham Ahmed  wrote:

> I also noticed that Gluster Snapshots have the SAME UUID as the main
> LV and if using UUID in fstab, the snapshot device is sometimes
> mounted instead of the primary LV
>
> For instance:
> /etc/fstab contains the following line:
>
> UUID=a0b85d33-7150-448a-9a70-6391750b90ad /gluster_bricks/gv01_data01
> auto inode64,noatime,nodiratime,x-parent=dMeNGb-34lY-wFVL-WF42-
> hlpE-TteI-lMhvvt
> 0 0
>
> # lvdisplay gluster00/lv01_data01
>   --- Logical volume ---
>   LV Path/dev/gluster00/lv01_data01
>   LV Namelv01_data01
>   VG Namegluster00
>
> # mount
> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0 on
> /gluster_bricks/gv01_data01 type xfs
> (rw,noatime,nodiratime,seclabel,attr2,inode64,sunit=
> 1024,swidth=2048,noquota)
>
> Notice above the device mounted at the brick mountpoint is not
> /dev/gluster00/lv01_data01 and instead is one of the snapshot devices
> of that LV
>
> # blkid
> /dev/mapper/gluster00-lv01_shaker_com_sa:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-55e97e7412bf48db99bb389bb708edb8_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-4ca8eef409ec4932828279efb91339de_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-59992b6c14644f13b5531a054d2aa75c_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-362b50c994b04284b1664b2e2eb49d09_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-0b3cc414f4cb4cddb6e81f162cdb7efe_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-da98ce5efda549039cf45a18e4eacbaf_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
> /dev/mapper/gluster00-4ea5cce4be704dd7b29986ae6698a666_0:
> UUID="a0b85d33-7150-448a-9a70-6391750b90ad" TYPE="xfs"
>
> Notice the UUID of LV and its snapshots is the same causing systemd to
> mount one of the snapshot devices instead of LV which results in the
> following gluster error:
>
> gluster> volume start gv01_data01 force
> volume start: gv01_data01: failed: Volume id mismatch for brick
> vhost03:/gluster_bricks/gv01_data01/gv. Expected volume id
> be6bc69b-c6ed-4329-b300-3b9044f375e1, volume id
> 55e97e74-12bf-48db-99bb-389bb708edb8 found
>


We do not recommend gluster volume snapshots for volumes hosting VM images.
Please look at the
https://ovirt.org/develop/release-management/features/gluster/gluster-dr/
as an alternative.


> On Sun, Jul 8, 2018 at 12:32 PM  wrote:
> >
> > I am facing this trouble since version 4.1 up to the latest 4.2.4, once
> > we enable gluster snapshots and accumulate some snapshots (as few as 15
> > snapshots per server) we start having trouble booting the server. The
> > server enters emergency shell upon boot after timing out waiting for
> > snapshot devices. Waiting a few minutes and pressing Control-D then
> boots the server normally. In case of very large number of snapshots (600+)
> it can take days before the sever will boot. Attaching journal
> > log, let me know if you need any other logs.
> >
> > Details of the setup:
> >
> > 3 node hyperconverged oVirt setup (64GB RAM, 8-Core E5 Xeon)
> > oVirt 4.2.4
> > oVirt Node 4.2.4
> > 10Gb Interface
> >
> > Thanks,
> >
> > Hesham S. Ahmed
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IIYOLQ6IEZF3QEUGL5AD6NTXD7VUK5AO/


[ovirt-users] Re: oVirt - Scaling out from one to many

2018-07-09 Thread Sahina Bose
On Fri, Jul 6, 2018 at 5:37 PM, Sandro Bonazzola 
wrote:

>
> 2018-07-05 13:16 GMT+02:00 Leo David :
>
>> Hello everyone,
>> I have two things that i really need to understand regarding storage
>> scaling using gluster.
>>
>> Basically, I am trying to figure out a way to go from 1 single instance
>> to multiple nodes cluster.
>>
>> 1. Single node SelfHosted Engine HyperConverged setup - scale instance up
>> to 3 nodes:
>> - the gluster volumes are created as distributed type.
>> - is there a procedure to migate this single-host scenario to multiple
>> nodes, considering that the 3 nodes setup is using replica 3 gluster
>> volumes ?
>>
>
> Sahina, a quick check on the oVirt website didn't help me finding the
> documentation for Leo.
> Can you please assist?
>

To move from a single host to multiple host, you can add additional hosts
to your cluster via Ovirt. Distribute volume can be changed to a replica 3
volume - but ensure there's no ongoing I/O. The easier process is to add
new replica 3 volumes and move your VM disks to it - for the VMs that
require HA.


>
>
>>
>> 2. 3 Nodes SelfHosted Engine - add more storage nodes
>> - how should I increase the size of the already present replica 3 volumes
>> ?
>> - should the volumes be as distribute-replicated in an environment larger
>> than 3 nodes ?
>>
>
> Couldn't find documentation on ovirt.org site, but this should apply to
> your case: https://access.redhat.com/documentation/en-us/red_
> hat_hyperconverged_infrastructure/1.1/html/maintaining_red_hat_
> hyperconverged_infrastructure/scaling#scaling_rhhi_by_
> adding_additional_volumes_on_new_nodes
> Sahina, can you help preparing some documentation for ovirt.org as well?
>

Sure, contributions are welcome though - Leo, if you feel like pitching in


>
>>
>> 3. Is there a limit as a maximum number of compute nodes per cluster ?
>>
>
> I'm not aware of limits but according to above document I won't go over
> the 9 nodes in this configuration.
>
>
>
>
>
>>
>> Thank you very much !
>>
>> Leo
>>
>> --
>> Best regards, Leo David
>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct: https://www.ovirt.org/communit
>> y/about/community-guidelines/
>> List Archives: https://lists.ovirt.org/archiv
>> es/list/users@ovirt.org/message/VIU4T447JRNYZ2HBF366JLWYTE3W42CZ/
>>
>>
>
>
> --
>
> SANDRO BONAZZOLA
>
> MANAGER, SOFTWARE ENGINEERING, EMEA R RHV
>
> Red Hat EMEA 
>
> sbona...@redhat.com
> 
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MC72XCF4QB5T5MXULGYCWY77OL42J5MC/


[ovirt-users] Re: Changing resolution of console on Mint guest running on Ovirt 4.2

2018-07-09 Thread Oliver Riesener

Same problem here on Debian STRETCH (stable). gnome-3.14

Debian Jessie (oldstable) gnome-3.12 works with manual/gui monitor settings.

No autoscale on both at the moment.

On 17.05.2018 21:59, pas...@butterflyit.com wrote:

I keep getting an error:  Could not set configuration for CRTC 63.

I checked that I have spice-agent and spice-agentd running. Also 
qemu-guest-agent. lspci shows that I have a QXL adapter. xrandr -q shows that I 
have many different config available. but none work.

What am I missing?

Screen 0: minimum 320 x 200, current 1024 x 768, maximum 8192 x 8192
Virtual-0 connected primary 1024x768+0+0 0mm x 0mm
1024x768  59.95*+
1920x1200 59.95
1920x1080 60.00
1600x1200 59.95
1680x1050 60.00
1400x1050 60.00
1280x1024 59.95
1440x900  59.99
1280x960  59.99
1280x854  59.95
1280x800  59.96
1280x720  59.97
1152x768  59.95
800x600   59.96
848x480   59.94
720x480   59.94
640x480   59.94
Virtual-1 disconnected
V
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DCIKA2UG2ZJAMOFWVFPBHBBKMNUNM5WJ/


[ovirt-users] Re: OVN ACLs

2018-07-09 Thread Niyazi Elvan
Hi Greg,

I appreciate that, thanks !

King regards,


On 7 Jul 2018 Sat at 01:25 Greg Sheremeta  wrote:

> Hi Niyazi!
>
> cc'ing some people who may be able to assist.
>
> Best wishes,
> Greg
>
> On Thu, Jul 5, 2018 at 3:20 PM Niyazi Elvan  wrote:
>
>> Hi All,
>>
>> I have started testing oVirt 4.2 and focused on OVN recently. I was
>> wondering whether there is a plan to manage L2->L7 ACLs through oVirt web
>> ui.
>> If not, how could it be possible to manage ACLs except command line tools
>> ? Using opendaylight ??
>>
>> All the best !
>> Niyazi
>>
>> --
>> Niyazi Elvan
>>
> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZIF233I4EJXWEQA56GGSGAMKEQI7C56X/
>>
>
>
> --
>
> GREG SHEREMETA
>
> SENIOR SOFTWARE ENGINEER - TEAM LEAD - RHV UX
>
> Red Hat NA
>
> 
>
> gsher...@redhat.comIRC: gshereme
> 
>
-- 
Niyazi Elvan
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/M7FRWC2XZZKOWTGZMR3H6JTOFVUMEWXC/


[ovirt-users] Re: OVIRT 4.2: MountError on Posix Compliant FS

2018-07-09 Thread Jose E Torres



Hi, welcome to oVirt community!


Hi Sandro, Nir and Tal, thanks in advance for your help,


Adding Nir and Tal for all your questions.
In the meanwhile, can you please provide a sos report from the host? 
Did dmesg provide any useful information?


I search on the dmesg output without results, I can't find relevant 
messages at the moment when I try to add the new storage domain or 
later. As you say, I attached a sosreport on this webpage (due to limits 
on the mailserver), maybe it's useful to find the cause of the problem. 
If you need more logs, confs, etc, I can add it without problems because 
we are on a testing environment.


SOSREPORT - https://jirafeau.net/f.php?h=3MT_XlnI=1

Kind regards!

--
Jose E Torres
Operations - Systems Administrator
Barcelona Supercomputing Center - Centro Nacional de Supercomputación
jose.tor...@bsc.es
www.bsc.es



http://bsc.es/disclaimer___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y7HINDBA6VHYAL25EGPWH4L3FVS7HAND/


[ovirt-users] Re: change from LDAP to AD authentication

2018-07-09 Thread Staniforth, Paul
Thanks Martin,

 Not really, my initial attempts were held up because I 
got confused with the new usrname format which is now UPN@domain and now 
equated to name@domain@domain or name@subdomain.domain@domain.

It may be easier for us to add permissions to the templates,VM, and disks or to 
use the roles service to find the user/object.


Regards,

  Paul S.


From: Martin Perina 
Sent: 08 July 2018 12:31
To: Staniforth, Paul
Cc: users; Ondra Machacek
Subject: Re: [ovirt-users] change from LDAP to AD authentication



On Thu, Jul 5, 2018 at 12:36 PM, 
mailto:p.stanifo...@leedsbeckett.ac.uk>> wrote:
Hello,
 as part of our policy I have to change from LDAP to Active Directory 
for authentication in our oVirt system.

?Hmm, do I understand that correctly that you were moving oVirt users from some 
other LDAP server to AD? Any reason other than political to do that?
?
I have managed to configure a test system that allows users to login using the 
CN (sAMAccountName) as before. The users in the system using the AD namespace 
are using their UPN for their user name.
Do we have to copy permissions from all the old accounts to their new accounts 
or is there a way to rename them to the UPN retaining there old permissions?

?I don't think there is any other way than to copy permissions. But you can 
automate the process using for example 
ovirt_permissions/ovirt_permissions?_facts Ansible modules [1] or one of our 
SDKs (Python, Java, Ruby).

Martin

[1] 
https://docs.ansible.com/ansible/latest/modules/list_of_cloud_modules.html#ovirt


Thanks,
Paul S.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3W3UAU3G3V53E7GT4CKT2MIH3GAFZ4DU/



--
Martin Perina
Associate Manager, Software Engineering
Red Hat Czech s.r.o.
To view the terms under which this email is distributed, please go to:-
http://disclaimer.leedsbeckett.ac.uk/disclaimer/disclaimer.html
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PVYRJMD3JVVBS5BQNBZZ6VGETAH2HZNZ/


[ovirt-users] Re: Cannot import a qcow2 image

2018-07-09 Thread etienne . charlier
Hello, 

A few comments from a novice...:

* Internal "stuff" ( ca  & certificates used to secure traffic between engine 
and hosts) should stay internal; users/admin shouldn't be aware of this.
* visible "stuff" ( ca & certs used to protect UI and API) should be easily 
modifiable

One way of fulfilling those "requirements":  
** One set of key/cert files shared between  "all" public endpoints ( API, UI, 
WEBsockets, ImageIo)
** Easily replaceable ( eg: known file location and a matter of reloading 
services after having updated the files)

IMHO, letstencrypt specific stuff is not needed: we could write a "plugin" for 
acme.sh (running on another bastion host)  responsible for pushing the renewed 
certs on engine vm when needed.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YXGIC56CYD6NA7LME24KABIXQIKAIMRX/


[ovirt-users] Re: OVN and MTU of vnics based on it clarification

2018-07-09 Thread Dominik Holler
On Sat, 7 Jul 2018 16:28:49 +0200
Gianluca Cecchi  wrote:

> Hello,
> I'm testing a virtual rhcs cluster based on 4 nodes that are CentOS
> 7.4 VMs. So the stack is based on Corosync/Pacemaker
> I have two oVirt hosts and so my plan is to put two VMs on first host
> and two VMs on the second host, to simulate a two sites config and
> site loss, before going to physical production config.
> Incidentally the two hypervisor hosts are indeed placed into different
> physical datacenters.
> So far so good.
> I decided to use OVN for the intracluster dedicated network
> configured for corosync (each VM has two vnics, one on production lan
> and one for intracluster).
> I detected that the cluster worked and formed (also only two nodes)
> only if the VMs run on the same host, while it seems they are not
> able to communicate when on different hosts. Ping is ok and an
> attempt of ssh session between them on intracluster lan, but cluster
> doesn't come up So after digging in past mailing list mails I found
> this recent one:
> https://lists.ovirt.org/archives/list/users@ovirt.org/thread/RMS7XFOZ67O3ERJB4ABX5MGXTE5FO2LT/
> 
> where the solution was to set 1400 for the MTU of the interfaces on
> OVN network.
> It seems it resolves the problem also in my scenario:
> - I live migrated two VMs on the second host and rhcs clusterware
> didn't complain
> - I relocated a resource group composed by several LV/FS, VIP and
> application from VM running on host1 to VM running on host2 without
> problems.
> 

There will be a new feature [1][2] about propagating the MTU of the
logical network into the guest.
In ovirt-4.2.5 the logical network MTU <= 1500 will be propagated for
clusters with switch type OVS and linux bridge, and MTU > 1500 will be
propagated only for clusters with switch type linux bridge, if the
requirements [3] are fulfilled in oVirt >= 4.2.5. OVS clusters will
work for MTU > 1500 latest in oVirt 4.3.
In this new feature a new default config setting "MTU for tunneled
networks" is introduced, which will be set initially to 1442.

> So the question is: can anyone confirm what are guidelines for
> settings vnics on OVN?

In the context of oVirt, I am only aware of [1] and [4].
Starting from oVirt 4.1 you can activate the OVN's internal dhcp server
by creating a subnet for the network [4]. The default configuration will
offer a MTU of 1442 to the guest, which is optimal for GENEVE tunneled
networks over physical networks with a MTU of 1500.

> Is there already a document in place about MTU
> settings for OVN based vnics? 

There are some documents about MTU in OpenStack referenced in [1].

> Other particular settings or
> limitations if I want to configure a vnic on OVN?
> 

libvirt's network filters are not applied to OVN networks, so you
should disable network filtering in oVirt's vNIC profile. This is
tracked in [5].


[1]
  
https://ovirt.org/develop/release-management/features/network/managed_mtu_for_vm_networks/

[2]
  https://github.com/oVirt/ovirt-site/pull/1667

[3]
  
https://ovirt.org/develop/release-management/features/network/managed_mtu_for_vm_networks/#limitations

[4]
  https://github.com/oVirt/ovirt-provider-ovn/#section-dhcp

[5]
  https://bugzilla.redhat.com/show_bug.cgi?id=1502754

> Thanks,
> 
> Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LZXNYHK554BCVHAXG2JXJ6AG3TU5DK4Y/