Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors - summary...

2015-09-21 Thread Karl Pielorz


In the vain hope this'll help others having this issue...

Having looked at this now for some time - and run a lot of tests, the 
current best solution to allow a FreeBSD domU under XenServer 6.5 to act as 
a gateway, or run OpenVPN (or dhcpd etc.) and remain agile - is to switch 
to VirtIO NIC's.


 em1000 (em) causes migrations to fail at the destination end.

 Realtek (re) causes 'watchdog errors' - and if you get rid of the 
watchdog code from the driver, you don't see the errors - but the NIC's 
won't pass traffic after a migrate either.


 PV (xn) have the original problem of not being able to provide routed 
traffic from one FreeBSD domU to other domU guests, don't work with OpenVPN 
(apparent 'packet size issues' from packet collation), and don't work for 
hosting dhcpd.


 virtio (vtnet) Work for routing for other domU's, work for OpenVPN, work 
for dhcpd. The *only* disadvantage [aside from possibly performance] I've 
found is that if you migrate a PV (xn) based FreeBSD domU you suffer around 
2-3 seconds of network disconnect.


With a VirtIO (vtnet) interface this can be as high as 18 seconds of 
network disconnect when you migrate. This means there's more of a chance of 
disrupting sessions that were active either on, or through the DomU. But it 
does work.



So until netfront  is fixed (where we can hopefully go back to the 
undoubtedly more efficient PV 'xn' NIC's) - this is the best workaround for 
us - to use virtio (vtnet).


-Karl
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"


Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-14 Thread Roger Pau Monné
El 12/09/15 a les 0.13, Sydney Meyer ha escrit:
> I just noticed that these performance problems do not occur under 10.0 and 
> 10.1.
> 
> Starting with 10.2 IPv4 TCP performance drops from ~12 Gb/s under 10.1 to 
> ~350 Mb/s under 10.2.
> 
> Should i write a new bugreport for this?

Thanks for the report! I'm sorry I didn't realize it before and 10.2
shipped with this performance regression. I've bisected it during the
weekend and found the culprit, fix is being worked on :).

https://lists.freebsd.org/pipermail/svn-src-all/2014-September/091787.html

Roger.

___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"


Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-14 Thread Sydney Meyer
I just read that you were working on importing a newer netfront from Linux and 
thought that maybe this could help finding the culprint, as KVM users seemed to 
have similar issues, e.g. issues with the offloading capabilities of the pv nic 
and/or checksum errors.

Anyhow, awesome news that you're on importing a new netfront.

> On 14 Sep 2015, at 18:00, Roger Pau Monné  wrote:
> 
> El 12/09/15 a les 0.13, Sydney Meyer ha escrit:
>> I just noticed that these performance problems do not occur under 10.0 and 
>> 10.1.
>> 
>> Starting with 10.2 IPv4 TCP performance drops from ~12 Gb/s under 10.1 to 
>> ~350 Mb/s under 10.2.
>> 
>> Should i write a new bugreport for this?
> 
> Thanks for the report! I'm sorry I didn't realize it before and 10.2
> shipped with this performance regression. I've bisected it during the
> weekend and found the culprit, fix is being worked on :).
> 
> https://lists.freebsd.org/pipermail/svn-src-all/2014-September/091787.html
> 
> Roger.
> 

___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-11 Thread Sydney Meyer
I just noticed that these performance problems do not occur under 10.0 and 10.1.

Starting with 10.2 IPv4 TCP performance drops from ~12 Gb/s under 10.1 to ~350 
Mb/s under 10.2.

Should i write a new bugreport for this?

> On 09 Sep 2015, at 22:58, Sydney Meyer  wrote:
> 
> Hello,
> 
> I'm running Xen 4.4.1 on a Debian 8 Dom0 and with 2 fresh FreeBSD 10.2 DomU's 
> (pf disabled):
> 
> IPv4:
> - Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B 
> (FreeBSD 10.2) "nc 10.0.30.95 5001 | dd of=/dev/zero bs=1M" = ~46 MB/s
> - Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B 
> (CentOS 7) "nc 10.0.30.95 5001 | dd of=/dev/zero bs=1M" = ~65 MB/s
> - Host A (CentOS 7): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B 
> (FreeBSD 10.2) "nc 10.0.30.111 5001 | dd of=/dev/zero bs=1M" = ~685 MB/s
> 
> IPv6:
> - Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
> (FreeBSD 10.2) "nc -6 2a02:a03f:a0f:a200:216:3eff:fee0:44bd 5001 | dd 
> of=/dev/zero bs=1M" = ~309 MB/s
> - Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
> (Centos 7) "nc -6 2a02:a03f:a0f:a200:216:3eff:fee0:44bd 5001 | dd 
> of=/dev/zero bs=1M" = ~246 MB/s
> - Host A (CentOS 7): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
> (FreeBSD 10.2) "nc -6 2a02:a03f:a0f:a200:216:3eff:fe7c:c4bd 5001 | dd 
> of=/dev/zero bs=1M" = ~352 MB/s
> 
> Also, i can confirm problems with FreeBSD 10 acting as a router for other 
> DomU on the same Dom0 (as in 
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=188261).
> 
>> On 09 Sep 2015, at 14:27, Roger Pau Monné  wrote:
>> 
>> Hello,
>> 
>> El 09/09/15 a les 11.33, Karl Pielorz ha escrit:
>>> 
>>> 
>>> --On 09 September 2015 11:04 +0200 Roger Pau Monné
>>>  wrote:
>>> 
 I'm working on importing a new netfront from Linux, which hopefully
 should solve the problems we are having with the PV nic.
>>> 
>>> That'll be great - I did test a CentOS box domU a while ago, and that
>>> didn't have the problem with routing traffic, I wasn't able to test
>>> things like OpenVPN/DCHP on it - but the current netfront issues are
>>> annoying [there's at least 2 FreeBSD PR's I know of this would also
>>> address].
>> 
>> Do you have an easy way to replicate those issues, right now I'm trying
>> with the following:
>> 
>> /etc/etc.conf:
>> pf_enable="YES"
>> 
>> /etc/pf.conf
>> block in all
>> pass out all keep state
>> 
>> But I don't seem to be able to reproduce them. Throughput between
>> DomU<->Dom0 or DomU<->DomU seems to be fine (no degradation when
>> compared to pf off).
>> 
>> Roger.
>> ___
>> freebsd-xen@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-xen
>> To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"
> 

___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-09 Thread Karl Pielorz


--On 08 September 2015 17:06 +0100 Karl Pielorz  
wrote:



XenServer logs:

 "Migrating VM 'FreeBSD (SWR)' from 'Xen1' to 'Xen2' Internal error:
Xenops_interface.Internal_error("Unix.Unix_error(2\"open\",\"/var/lib/xen
/qemu-save.36\")")"


As a follow-up to this - the patch definitely breaks migrations. Removing 
the 'e1000' from the custom field (so you get Realtek NIC's) - the migrate 
completes fine.


Leave 'e1000' set - you get Intel em devices, but migrates fail - XenCenter 
spits out the error above.



From /var/log/messages on the destination node - I can also see:


"
Sep  9 09:18:55 Xen2 xapi: [ info|Xen2|33915 INET 
:::80|network.attach_for_vm R:xxx|xapi] PIF 
yy---yyy-y is needed by a VM, but not managed by xapi. 
The bridge must be configured through other means.

"

So it looks like XenServer at least cannot support this configuration? - So 
I'm back painted into a corner :( - PV NIC's can't "route" traffic (or do 
DHCP servers, or OpenVPN etc.) - HVM NIC's can do all of that but cannot be 
used agile, and may fail eventually anyway with 'watchdog timeout'.


Gah.

-Karl
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"


Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-09 Thread Roger Pau Monné
Hello,

El 09/09/15 a les 11.33, Karl Pielorz ha escrit:
> 
> 
> --On 09 September 2015 11:04 +0200 Roger Pau Monné
>  wrote:
> 
>> I'm working on importing a new netfront from Linux, which hopefully
>> should solve the problems we are having with the PV nic.
> 
> That'll be great - I did test a CentOS box domU a while ago, and that
> didn't have the problem with routing traffic, I wasn't able to test
> things like OpenVPN/DCHP on it - but the current netfront issues are
> annoying [there's at least 2 FreeBSD PR's I know of this would also
> address].

Do you have an easy way to replicate those issues, right now I'm trying
with the following:

/etc/etc.conf:
pf_enable="YES"

/etc/pf.conf
block in all
pass out all keep state

But I don't seem to be able to reproduce them. Throughput between
DomU<->Dom0 or DomU<->DomU seems to be fine (no degradation when
compared to pf off).

Roger.
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-09 Thread Karl Pielorz



--On 09 September 2015 11:04 +0200 Roger Pau Monné  
wrote:



I'm working on importing a new netfront from Linux, which hopefully
should solve the problems we are having with the PV nic.


That'll be great - I did test a CentOS box domU a while ago, and that 
didn't have the problem with routing traffic, I wasn't able to test things 
like OpenVPN/DCHP on it - but the current netfront issues are annoying 
[there's at least 2 FreeBSD PR's I know of this would also address].



In the meantime, I have the following crappy patch to FreeBSD if_re
driver, which should disable the watchdog (I have not even compile
tested this).


Thanks - I'll give that a go...  I also discovered that the 'virtio' 
virtual NIC's - *do* migrate OK, I'm just setting up a test platform using 
those to see if they have the same netfront issues as PV xn interfaces 
(i.e. can't route / openvpn fails).


I'll post back with the results of that + the patch when I have them,

Thanks,

-Kp


___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-09 Thread Karl Pielorz



--On 09 September 2015 11:04 +0200 Roger Pau Monné  
wrote:



In the meantime, I have the following crappy patch to FreeBSD if_re
driver, which should disable the watchdog (I have not even compile
tested this).


fyi - The patch stops the 'watchdog timeout' errors - but the NIC just 
'dies' after the live migrate (i.e. same as before, but no errors logged).


VirtIO (vtnet) so far seems to work for migrates, and work for routing 
traffic to other DomU guests - so I'm continuing to test that now as a 
workaround (i.e. test with OpenVPN etc.)


The only thing I've noticed with it is that established TCP sessions 
'through the DomU' tend to die during the migrate (i.e. the DomU 'lands' at 
the destination node - but any transfers in progress stall - there's a 
50/50 chance they recover) - but if I migrate the Windows DomU that's 
actually got the download in progress, the session stays up - hence more 
testing needed.


-Karl
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-09 Thread Sydney Meyer
Hello,

I'm running Xen 4.4.1 on a Debian 8 Dom0 and with 2 fresh FreeBSD 10.2 DomU's 
(pf disabled):

IPv4:
- Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B 
(FreeBSD 10.2) "nc 10.0.30.95 5001 | dd of=/dev/zero bs=1M" = ~46 MB/s
- Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B 
(CentOS 7) "nc 10.0.30.95 5001 | dd of=/dev/zero bs=1M" = ~65 MB/s
- Host A (CentOS 7): "dd if=/dev/zero bs=1M | nc -l 5001" ---> Host B (FreeBSD 
10.2) "nc 10.0.30.111 5001 | dd of=/dev/zero bs=1M" = ~685 MB/s

IPv6:
- Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
(FreeBSD 10.2) "nc -6 2a02:a03f:a0f:a200:216:3eff:fee0:44bd 5001 | dd 
of=/dev/zero bs=1M" = ~309 MB/s
- Host A (FreeBSD 10.2): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
(Centos 7) "nc -6 2a02:a03f:a0f:a200:216:3eff:fee0:44bd 5001 | dd of=/dev/zero 
bs=1M" = ~246 MB/s
- Host A (CentOS 7): "dd if=/dev/zero bs=1M | nc -6 -l 5001" ---> Host B 
(FreeBSD 10.2) "nc -6 2a02:a03f:a0f:a200:216:3eff:fe7c:c4bd 5001 | dd 
of=/dev/zero bs=1M" = ~352 MB/s

Also, i can confirm problems with FreeBSD 10 acting as a router for other DomU 
on the same Dom0 (as in 
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=188261).

> On 09 Sep 2015, at 14:27, Roger Pau Monné  wrote:
> 
> Hello,
> 
> El 09/09/15 a les 11.33, Karl Pielorz ha escrit:
>> 
>> 
>> --On 09 September 2015 11:04 +0200 Roger Pau Monné
>>  wrote:
>> 
>>> I'm working on importing a new netfront from Linux, which hopefully
>>> should solve the problems we are having with the PV nic.
>> 
>> That'll be great - I did test a CentOS box domU a while ago, and that
>> didn't have the problem with routing traffic, I wasn't able to test
>> things like OpenVPN/DCHP on it - but the current netfront issues are
>> annoying [there's at least 2 FreeBSD PR's I know of this would also
>> address].
> 
> Do you have an easy way to replicate those issues, right now I'm trying
> with the following:
> 
> /etc/etc.conf:
> pf_enable="YES"
> 
> /etc/pf.conf
> block in all
> pass out all keep state
> 
> But I don't seem to be able to reproduce them. Throughput between
> DomU<->Dom0 or DomU<->DomU seems to be fine (no degradation when
> compared to pf off).
> 
> Roger.
> ___
> freebsd-xen@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-xen
> To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-08 Thread Karl Pielorz



--On 07 September 2015 13:11 -0400 Adam McDougall  
wrote:



Try this:
http://discussions.citrix.com/topic/329848-openbsd-with-message-watchdog-
timeout/

I've used it before for an OpenBSD guest VM.  If that patch is in place
on any XenServer you migrate to, it should set e1000 for the NIC type if
you set NicEmulation in the Custom Fields as described in step 2.

A cleaner patch would just let you specify any driver you want, but this
one is semi-hardcoded for e1000.  You could also try 'virtio' with:
argv = [arg.replace('rtl8139','virtio') for arg in argv]


Hi,

Thanks for the link. I patched both our in-house XenServers 
[6.5SP1+Hotfixes] with this - it appeared to work OK (i.e. the DomU's came 
up with em0 NIC's - rather than Realtek).


They seem to work OK for pushing traffic - but if I try and migrate a guest 
VM using Intel NIC's with XenServer - it fails miserably and kills it :(


XenServer logs:

"Migrating VM 'FreeBSD (SWR)' from 'Xen1' to 'Xen2' Internal error: 
Xenops_interface.Internal_error("Unix.Unix_error(2\"open\",\"/var/lib/xen/qemu-save.36\")")"


Other VM's not using the Intel NIC's seem OK. I made sure both the 'source' 
and 'destination' XenServers are patch (there's only two in the pool).


-Karl



___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"


Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-07 Thread Roger Pau Monné
El 07/09/15 a les 14.58, Karl Pielorz ha escrit:
> 
> Hi,
> 
> We have a bunch of FreeBSD 10.1 hosts we run the NIC's as HVM (this
> avoids a known networking issue with XenServer and PV networks under
> FreeBSD to do with routing packets for other Xen DomU).
> 
> This seems to work OK - but recently we've had a whole bunch of:
> 
>  Sep  7 12:41:08 host1 kernel: re0: watchdog timeout
> ...
>  Sep  7 12:42:03 host1 kernel: re1: watchdog timeout
> 
> 
> This seems to 'break' networking for the guest. I've tried doing
> 'ifconfig re0 down / ifconfig re0 up' - this doesn't appear to fix
> things - but rebooting the DomU guest does (when it comes back,
> networking is restored).
> 
> Searching around with Google turns up some similar issues with NetBSD
> along with the comment that the QEMU Realtek card doesn't have watchdog
> emulation - so it should be disabled in the driver.
> 
> How applicable to FreeBSD that is, I don't know.
> 
> This seems to happen when the VM's are migrated (e.g. Xen Storage
> Motion) from one SR to another - or migrated, full stop (i.e. from one
> XenServer to another - but using shared storage).

This sounds like a problem in the if_re driver on FreeBSD or a bug in
Qemu emulation. As a workaround, could you try to use the emulated e1000
instead of the realtek? AFAIK it should also provide better performance.

Since you are using XenServer I'm not sure of the exact rune, with OSS
Xen you basically need to add "model=e1000" to the network configuration
line.

Roger.

___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"


Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-07 Thread Karl Pielorz


--On 07 September 2015 16:51 +0200 Roger Pau Monné  
wrote:



This sounds like a problem in the if_re driver on FreeBSD or a bug in
Qemu emulation. As a workaround, could you try to use the emulated e1000
instead of the realtek? AFAIK it should also provide better performance.

Since you are using XenServer I'm not sure of the exact rune, with OSS
Xen you basically need to add "model=e1000" to the network configuration
line.


Hi Roger,

I'd love to try that - but as we're running full 'XenServer' (i.e. 
installed from XenServer .ISO) I have no idea where (or even if it's 
possible) to set that :(


Last time I tried looking it up I hit a bit of a dead end, I'll have 
another look [in fact, I think it may have actually been impossible to set 
in XenServer as I seem to remember someone saying that XenServer doesn't 
have that NIC model/emulation available, only Realtek or PV!]


But I'll have another look...

-Karl
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"

Re: XenServer 6.5(SP1) - HVM 're0: watchdog timeout' errors...

2015-09-07 Thread Adam McDougall
On 09/07/2015 11:00, Karl Pielorz wrote:
> 
> --On 07 September 2015 16:51 +0200 Roger Pau Monné
>  wrote:
> 
>> This sounds like a problem in the if_re driver on FreeBSD or a bug in
>> Qemu emulation. As a workaround, could you try to use the emulated e1000
>> instead of the realtek? AFAIK it should also provide better performance.
>>
>> Since you are using XenServer I'm not sure of the exact rune, with OSS
>> Xen you basically need to add "model=e1000" to the network configuration
>> line.
> 
> Hi Roger,
> 
> I'd love to try that - but as we're running full 'XenServer' (i.e.
> installed from XenServer .ISO) I have no idea where (or even if it's
> possible) to set that :(
> 
> Last time I tried looking it up I hit a bit of a dead end, I'll have
> another look [in fact, I think it may have actually been impossible to
> set in XenServer as I seem to remember someone saying that XenServer
> doesn't have that NIC model/emulation available, only Realtek or PV!]
> 
> But I'll have another look...
> 
> -Karl

Try this:
http://discussions.citrix.com/topic/329848-openbsd-with-message-watchdog-timeout/

I've used it before for an OpenBSD guest VM.  If that patch is in place
on any XenServer you migrate to, it should set e1000 for the NIC type if
you set NicEmulation in the Custom Fields as described in step 2.

A cleaner patch would just let you specify any driver you want, but this
one is semi-hardcoded for e1000.  You could also try 'virtio' with:
argv = [arg.replace('rtl8139','virtio') for arg in argv]
___
freebsd-xen@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to "freebsd-xen-unsubscr...@freebsd.org"