Re: [gentoo-user] Networking trouble

2015-10-30 Thread hw

J. Roeleveld wrote:

Quick reply from mobile.
Will give a more detailed one later.

  Noticed you are using ZFS. Where is your swap partition located?

On ZFS or?


Swap for dom0 is on a mdraid partition.  Dom0 has 4GB RAM because it's
supposed to be used for making backups once I get to set that up and is
not swapping.




Re: [gentoo-user] Networking trouble

2015-10-29 Thread J. Roeleveld
On 29 October 2015 11:29:18 CET, hw  wrote:
>J. Roeleveld wrote:
>> On Thursday, October 15, 2015 05:46:07 PM hw wrote:
>>> J. Roeleveld wrote:
 On Thursday, October 15, 2015 03:30:01 PM hw wrote:
> Hi,
>
> I have a xen host with some HV guests which becomes unreachable
>via
> the network after apparently random amount of times.  I have
>already
> switched the network card to see if that would make a difference,
> and with the card currently installed, it worked fine for over 20
>days
> until it become unreachable again.  Before switching the network
>card,
> it would run a week or two before becoming unreachable.  The
>previous
> card was the on-board BCM5764M which uses the tg3 driver.
>
> There are messages like this in the log file:
>
>
> Oct 14 20:58:02 moonflo kernel: [ cut here
>]
> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14
>20:58:02
> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0
>timed
> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb
>md4 hmac
> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter
>ip_tables
> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight
> drm_kms_helper
> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm
>snd_timer snd
> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul
>ablk_helper
> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd
>usb_storage
> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo
>kernel: CPU:
> 10 PID: 0 Comm: swapper/10 Tainted: P   O4.0.5-gentoo
>#3 Oct
> 14
> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02
>moonflo
> kernel:  8175a77d 880124d43d98 814da8d8
> 0001 Oct 14 20:58:02 moonflo kernel:  880124d43de8
> 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02
> moonflo
> kernel:   8800d45f2000 0001
> 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
> Oct 14 20:58:02 moonflo kernel:[]
> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel:
> []
> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel:
> [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02
>moonflo
> kernel:  [] ?
>add_interrupt_randomness+0x35/0x1e0 Oct
> 14
> 20:58:02 moonflo kernel:  []
>dev_watchdog+0x259/0x270
> Oct
> 14 20:58:02 moonflo kernel:  [] ?
> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel:
> [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02
>moonflo
> kernel:  [] call_timer_fn.isra.30+0x17/0x70 Oct
>14
> 20:58:02 moonflo kernel:  []
> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel:
> [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02
>moonflo
> kernel:  [] irq_exit+0x7e/0xa0 Oct 14 20:58:02
>moonflo
> kernel:  [] xen_evtchn_do_upcall+0x35/0x50 Oct
>14
> 20:58:02 moonflo kernel:  []
> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo
>kernel:
> 
>
>[] ? xen_hypercall_sched_op+0xa/0x20 Oct 14
>20:58:02
>
> moonflo kernel:  [] ?
>xen_hypercall_sched_op+0xa/0x20
> Oct
> 14 20:58:02 moonflo kernel:  [] ?
> xen_safe_halt+0x10/0x20
> Oct 14 20:58:02 moonflo kernel:  [] ?
> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel:
> []
> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel:
> [] ? cpu_startup_entry+0x190/0x2f0 Oct 14
>20:58:02
> moonflo kernel:  [] ?
>cpu_bringup_and_idle+0x25/0x40
> Oct
> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]---
>Oct 14
> 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up
>
>
> After that, there are lots of messages about the link being up,
>one
> message
> every 12 seconds.  When you unplug the network cable, you get a
>message
> that the link is down, and no message when you plug it in again.
>
> I was hoping that switching the network card (to one that uses a
> different
> driver) might solve the problem, and it did not.  Now I can only
>guess
> that
> the network card goes to sleep and sometimes cannot be woken up
>again.
>
> I tried to reduce the connection speed to 100Mbit and found that
> accessing
> the VMs (via RDP) becomes too slow to use them.  So I disabled the
>power
> management of the network card (through sysfs) and will have to
>see if
> the
> problem persists.
>
> We'll be getting decent network cards in a couple days, but since
>the
> problem 

Re: [gentoo-user] Networking trouble

2015-10-29 Thread hw

J. Roeleveld wrote:

On Thursday, October 15, 2015 05:46:07 PM hw wrote:

J. Roeleveld wrote:

On Thursday, October 15, 2015 03:30:01 PM hw wrote:

Hi,

I have a xen host with some HV guests which becomes unreachable via
the network after apparently random amount of times.  I have already
switched the network card to see if that would make a difference,
and with the card currently installed, it worked fine for over 20 days
until it become unreachable again.  Before switching the network card,
it would run a week or two before becoming unreachable.  The previous
card was the on-board BCM5764M which uses the tg3 driver.

There are messages like this in the log file:


Oct 14 20:58:02 moonflo kernel: [ cut here ]
Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02
moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed
out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac
nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables
xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight
drm_kms_helper
ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd
soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper
cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage
ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU:
10 PID: 0 Comm: swapper/10 Tainted: P   O4.0.5-gentoo #3 Oct
14
20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo
kernel:  8175a77d 880124d43d98 814da8d8
0001 Oct 14 20:58:02 moonflo kernel:  880124d43de8
880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02
moonflo
kernel:   8800d45f2000 0001
8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
Oct 14 20:58:02 moonflo kernel:[]
dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel:
[]
warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel:
[] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo
kernel:  [] ? add_interrupt_randomness+0x35/0x1e0 Oct
14
20:58:02 moonflo kernel:  [] dev_watchdog+0x259/0x270
Oct
14 20:58:02 moonflo kernel:  [] ?
dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel:
[] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo
kernel:  [] call_timer_fn.isra.30+0x17/0x70 Oct 14
20:58:02 moonflo kernel:  []
run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel:
[] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo
kernel:  [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo
kernel:  [] xen_evtchn_do_upcall+0x35/0x50 Oct 14
20:58:02 moonflo kernel:  []
xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel:


   [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02

moonflo kernel:  [] ? xen_hypercall_sched_op+0xa/0x20
Oct
14 20:58:02 moonflo kernel:  [] ?
xen_safe_halt+0x10/0x20
Oct 14 20:58:02 moonflo kernel:  [] ?
default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel:
[]
? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel:
[] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02
moonflo kernel:  [] ? cpu_bringup_and_idle+0x25/0x40
Oct
14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14
20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up


After that, there are lots of messages about the link being up, one
message
every 12 seconds.  When you unplug the network cable, you get a message
that the link is down, and no message when you plug it in again.

I was hoping that switching the network card (to one that uses a
different
driver) might solve the problem, and it did not.  Now I can only guess
that
the network card goes to sleep and sometimes cannot be woken up again.

I tried to reduce the connection speed to 100Mbit and found that
accessing
the VMs (via RDP) becomes too slow to use them.  So I disabled the power
management of the network card (through sysfs) and will have to see if
the
problem persists.

We'll be getting decent network cards in a couple days, but since the
problem doesn't seem to be related to a particular
card/model/manufacturer,
that might not fix it, either.

This problem seems to only occur on machines that operate as a xen
server.
Other machines, identical Z800s, not running xen, run just fine.

What would you suggest?


More info required:

- Which version of Xen


4.5.1

Installed versions:  4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug
-efi -flask -xsm)


Ok, recent one.


- Does this only occur with HVM guests?


The host has been running only HVM guests every time it happend.
It was running a PV guest in between (which I had to shut down
because other VMs were migrated, requiring the RAM).


The PV didn't 

Re: [gentoo-user] Networking trouble

2015-10-15 Thread hw

J. Roeleveld wrote:

On Thursday, October 15, 2015 03:30:01 PM hw wrote:

Hi,

I have a xen host with some HV guests which becomes unreachable via
the network after apparently random amount of times.  I have already
switched the network card to see if that would make a difference,
and with the card currently installed, it worked fine for over 20 days
until it become unreachable again.  Before switching the network card,
it would run a week or two before becoming unreachable.  The previous
card was the on-board BCM5764M which uses the tg3 driver.

There are messages like this in the log file:


Oct 14 20:58:02 moonflo kernel: [ cut here ]
Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02
moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed
out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac
nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables
xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper
ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd
soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper
cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage
ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU:
10 PID: 0 Comm: swapper/10 Tainted: P   O4.0.5-gentoo #3 Oct 14
20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo
kernel:  8175a77d 880124d43d98 814da8d8
0001 Oct 14 20:58:02 moonflo kernel:  880124d43de8
880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo
kernel:   8800d45f2000 0001
8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
Oct 14 20:58:02 moonflo kernel:[]
dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel:  []
warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel:
[] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo
kernel:  [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14
20:58:02 moonflo kernel:  [] dev_watchdog+0x259/0x270 Oct
14 20:58:02 moonflo kernel:  [] ?
dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel:
[] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo
kernel:  [] call_timer_fn.isra.30+0x17/0x70 Oct 14
20:58:02 moonflo kernel:  []
run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel:
[] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo
kernel:  [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo
kernel:  [] xen_evtchn_do_upcall+0x35/0x50 Oct 14
20:58:02 moonflo kernel:  []
xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel:  
  [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02
moonflo kernel:  [] ? xen_hypercall_sched_op+0xa/0x20 Oct
14 20:58:02 moonflo kernel:  [] ? xen_safe_halt+0x10/0x20
Oct 14 20:58:02 moonflo kernel:  [] ?
default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel:  []
? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel:
[] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02
moonflo kernel:  [] ? cpu_bringup_and_idle+0x25/0x40 Oct
14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14
20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up


After that, there are lots of messages about the link being up, one message
every 12 seconds.  When you unplug the network cable, you get a message that
the link is down, and no message when you plug it in again.

I was hoping that switching the network card (to one that uses a different
driver) might solve the problem, and it did not.  Now I can only guess that
the network card goes to sleep and sometimes cannot be woken up again.

I tried to reduce the connection speed to 100Mbit and found that accessing
the VMs (via RDP) becomes too slow to use them.  So I disabled the power
management of the network card (through sysfs) and will have to see if the
problem persists.

We'll be getting decent network cards in a couple days, but since the
problem doesn't seem to be related to a particular card/model/manufacturer,
that might not fix it, either.

This problem seems to only occur on machines that operate as a xen server.
Other machines, identical Z800s, not running xen, run just fine.

What would you suggest?


More info required:

- Which version of Xen


4.5.1

Installed versions:  4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug -efi 
-flask -xsm)


- Does this only occur with HVM guests?


The host has been running only HVM guests every time it happend.
It was running a PV guest in between (which I had to shut down
because other VMs were migrated, requiring the RAM).


- Which network-driver are you using inside the guest


r8169, compiled as a module

Same happened with 

[gentoo-user] Networking trouble

2015-10-15 Thread hw


Hi,

I have a xen host with some HV guests which becomes unreachable via
the network after apparently random amount of times.  I have already
switched the network card to see if that would make a difference,
and with the card currently installed, it worked fine for over 20 days
until it become unreachable again.  Before switching the network card,
it would run a week or two before becoming unreachable.  The previous
card was the on-board BCM5764M which uses the tg3 driver.

There are messages like this in the log file:


Oct 14 20:58:02 moonflo kernel: [ cut here ]
Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at 
net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270()
Oct 14 20:58:02 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit 
queue 0 timed out
Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac nls_utf8 
cifs fscache xt_physdev br_netfilter iptable_filter ip_tables xen_pciback 
xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau snd_hda_codec_realtek 
snd_hda_codec_generic zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) 
zlib_deflate video backlight drm_kms_helper ttm snd_hda_intel 
snd_hda_controller snd_hda_codec snd_pcm snd_timer snd soundcore r8169 mii xts 
aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 
sha256_generic hid_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd 
usbcore usb_common
Oct 14 20:58:02 moonflo kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: P  
 O4.0.5-gentoo #3
Oct 14 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 
Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013
Oct 14 20:58:02 moonflo kernel:  8175a77d 880124d43d98 
814da8d8 0001
Oct 14 20:58:02 moonflo kernel:  880124d43de8 880124d43dd8 
81088850 880124d43dd8
Oct 14 20:58:02 moonflo kernel:   8800d45f2000 
0001 8800d5294880
Oct 14 20:58:02 moonflo kernel: Call Trace:
Oct 14 20:58:02 moonflo kernel:[] 
dump_stack+0x45/0x57
Oct 14 20:58:02 moonflo kernel:  [] 
warn_slowpath_common+0x80/0xc0
Oct 14 20:58:02 moonflo kernel:  [] 
warn_slowpath_fmt+0x41/0x50
Oct 14 20:58:02 moonflo kernel:  [] ? 
add_interrupt_randomness+0x35/0x1e0
Oct 14 20:58:02 moonflo kernel:  [] dev_watchdog+0x259/0x270
Oct 14 20:58:02 moonflo kernel:  [] ? 
dev_graft_qdisc+0x80/0x80
Oct 14 20:58:02 moonflo kernel:  [] ? 
dev_graft_qdisc+0x80/0x80
Oct 14 20:58:02 moonflo kernel:  [] 
call_timer_fn.isra.30+0x17/0x70
Oct 14 20:58:02 moonflo kernel:  [] 
run_timer_softirq+0x176/0x2b0
Oct 14 20:58:02 moonflo kernel:  [] __do_softirq+0xda/0x1f0
Oct 14 20:58:02 moonflo kernel:  [] irq_exit+0x7e/0xa0
Oct 14 20:58:02 moonflo kernel:  [] 
xen_evtchn_do_upcall+0x35/0x50
Oct 14 20:58:02 moonflo kernel:  [] 
xen_do_hypervisor_callback+0x1e/0x40
Oct 14 20:58:02 moonflo kernel:[] ? 
xen_hypercall_sched_op+0xa/0x20
Oct 14 20:58:02 moonflo kernel:  [] ? 
xen_hypercall_sched_op+0xa/0x20
Oct 14 20:58:02 moonflo kernel:  [] ? xen_safe_halt+0x10/0x20
Oct 14 20:58:02 moonflo kernel:  [] ? default_idle+0x9/0x10
Oct 14 20:58:02 moonflo kernel:  [] ? arch_cpu_idle+0xa/0x10
Oct 14 20:58:02 moonflo kernel:  [] ? 
cpu_startup_entry+0x190/0x2f0
Oct 14 20:58:02 moonflo kernel:  [] ? 
cpu_bringup_and_idle+0x25/0x40
Oct 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]---
Oct 14 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up


After that, there are lots of messages about the link being up, one message
every 12 seconds.  When you unplug the network cable, you get a message that
the link is down, and no message when you plug it in again.

I was hoping that switching the network card (to one that uses a different
driver) might solve the problem, and it did not.  Now I can only guess that
the network card goes to sleep and sometimes cannot be woken up again.

I tried to reduce the connection speed to 100Mbit and found that accessing the 
VMs
(via RDP) becomes too slow to use them.  So I disabled the power management of 
the
network card (through sysfs) and will have to see if the problem persists.

We'll be getting decent network cards in a couple days, but since the problem
doesn't seem to be related to a particular card/model/manufacturer, that might
not fix it, either.

This problem seems to only occur on machines that operate as a xen server.
Other machines, identical Z800s, not running xen, run just fine.

What would you suggest?



Re: [gentoo-user] Networking trouble

2015-10-15 Thread J. Roeleveld
On Thursday, October 15, 2015 03:30:01 PM hw wrote:
> Hi,
> 
> I have a xen host with some HV guests which becomes unreachable via
> the network after apparently random amount of times.  I have already
> switched the network card to see if that would make a difference,
> and with the card currently installed, it worked fine for over 20 days
> until it become unreachable again.  Before switching the network card,
> it would run a week or two before becoming unreachable.  The previous
> card was the on-board BCM5764M which uses the tg3 driver.
> 
> There are messages like this in the log file:
> 
> 
> Oct 14 20:58:02 moonflo kernel: [ cut here ]
> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02
> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed
> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac
> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables
> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper
> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd
> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper
> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage
> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU:
> 10 PID: 0 Comm: swapper/10 Tainted: P   O4.0.5-gentoo #3 Oct 14
> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo
> kernel:  8175a77d 880124d43d98 814da8d8
> 0001 Oct 14 20:58:02 moonflo kernel:  880124d43de8
> 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo
> kernel:   8800d45f2000 0001
> 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
> Oct 14 20:58:02 moonflo kernel:[]
> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel:  []
> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: 
> [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo
> kernel:  [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14
> 20:58:02 moonflo kernel:  [] dev_watchdog+0x259/0x270 Oct
> 14 20:58:02 moonflo kernel:  [] ?
> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: 
> [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo
> kernel:  [] call_timer_fn.isra.30+0x17/0x70 Oct 14
> 20:58:02 moonflo kernel:  []
> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: 
> [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo
> kernel:  [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo
> kernel:  [] xen_evtchn_do_upcall+0x35/0x50 Oct 14
> 20:58:02 moonflo kernel:  []
> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel:  
>  [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02
> moonflo kernel:  [] ? xen_hypercall_sched_op+0xa/0x20 Oct
> 14 20:58:02 moonflo kernel:  [] ? xen_safe_halt+0x10/0x20
> Oct 14 20:58:02 moonflo kernel:  [] ?
> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel:  []
> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: 
> [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02
> moonflo kernel:  [] ? cpu_bringup_and_idle+0x25/0x40 Oct
> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14
> 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up
> 
> 
> After that, there are lots of messages about the link being up, one message
> every 12 seconds.  When you unplug the network cable, you get a message that
> the link is down, and no message when you plug it in again.
> 
> I was hoping that switching the network card (to one that uses a different
> driver) might solve the problem, and it did not.  Now I can only guess that
> the network card goes to sleep and sometimes cannot be woken up again.
> 
> I tried to reduce the connection speed to 100Mbit and found that accessing
> the VMs (via RDP) becomes too slow to use them.  So I disabled the power
> management of the network card (through sysfs) and will have to see if the
> problem persists.
> 
> We'll be getting decent network cards in a couple days, but since the
> problem doesn't seem to be related to a particular card/model/manufacturer,
> that might not fix it, either.
> 
> This problem seems to only occur on machines that operate as a xen server.
> Other machines, identical Z800s, not running xen, run just fine.
> 
> What would you suggest?

More info required:

- Which version of Xen
- Does this only occur with HVM guests?
- Which network-driver are you using inside the guest
- Can you connect to the "local" console of the guest?
- If yes, does it still have no connectivity?

I saw the same on my lab machine, which was related to:
- Not using correct drivers inside 

Re: [gentoo-user] Networking trouble

2015-10-15 Thread J. Roeleveld
On Thursday, October 15, 2015 05:46:07 PM hw wrote:
> J. Roeleveld wrote:
> > On Thursday, October 15, 2015 03:30:01 PM hw wrote:
> >> Hi,
> >> 
> >> I have a xen host with some HV guests which becomes unreachable via
> >> the network after apparently random amount of times.  I have already
> >> switched the network card to see if that would make a difference,
> >> and with the card currently installed, it worked fine for over 20 days
> >> until it become unreachable again.  Before switching the network card,
> >> it would run a week or two before becoming unreachable.  The previous
> >> card was the on-board BCM5764M which uses the tg3 driver.
> >> 
> >> There are messages like this in the log file:
> >> 
> >> 
> >> Oct 14 20:58:02 moonflo kernel: [ cut here ]
> >> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at
> >> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02
> >> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed
> >> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac
> >> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables
> >> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau
> >> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO)
> >> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight
> >> drm_kms_helper
> >> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd
> >> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper
> >> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage
> >> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU:
> >> 10 PID: 0 Comm: swapper/10 Tainted: P   O4.0.5-gentoo #3 Oct
> >> 14
> >> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800
> >> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo
> >> kernel:  8175a77d 880124d43d98 814da8d8
> >> 0001 Oct 14 20:58:02 moonflo kernel:  880124d43de8
> >> 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02
> >> moonflo
> >> kernel:   8800d45f2000 0001
> >> 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace:
> >> Oct 14 20:58:02 moonflo kernel:[]
> >> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: 
> >> []
> >> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel:
> >> [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo
> >> kernel:  [] ? add_interrupt_randomness+0x35/0x1e0 Oct
> >> 14
> >> 20:58:02 moonflo kernel:  [] dev_watchdog+0x259/0x270
> >> Oct
> >> 14 20:58:02 moonflo kernel:  [] ?
> >> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel:
> >> [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo
> >> kernel:  [] call_timer_fn.isra.30+0x17/0x70 Oct 14
> >> 20:58:02 moonflo kernel:  []
> >> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel:
> >> [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo
> >> kernel:  [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo
> >> kernel:  [] xen_evtchn_do_upcall+0x35/0x50 Oct 14
> >> 20:58:02 moonflo kernel:  []
> >> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: 
> >> 
> >> 
> >>   [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02
> >> 
> >> moonflo kernel:  [] ? xen_hypercall_sched_op+0xa/0x20
> >> Oct
> >> 14 20:58:02 moonflo kernel:  [] ?
> >> xen_safe_halt+0x10/0x20
> >> Oct 14 20:58:02 moonflo kernel:  [] ?
> >> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: 
> >> []
> >> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel:
> >> [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02
> >> moonflo kernel:  [] ? cpu_bringup_and_idle+0x25/0x40
> >> Oct
> >> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14
> >> 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up
> >> 
> >> 
> >> After that, there are lots of messages about the link being up, one
> >> message
> >> every 12 seconds.  When you unplug the network cable, you get a message
> >> that the link is down, and no message when you plug it in again.
> >> 
> >> I was hoping that switching the network card (to one that uses a
> >> different
> >> driver) might solve the problem, and it did not.  Now I can only guess
> >> that
> >> the network card goes to sleep and sometimes cannot be woken up again.
> >> 
> >> I tried to reduce the connection speed to 100Mbit and found that
> >> accessing
> >> the VMs (via RDP) becomes too slow to use them.  So I disabled the power
> >> management of the network card (through sysfs) and will have to see if
> >> the
> >> problem persists.
> >> 
> >> We'll be getting decent network cards in a couple days, but since the
> >> problem doesn't seem to be related to a particular
> >> card/model/manufacturer,
> >> that might not fix it, either.
> >> 
> >> This problem seems to only occur on machines that operate as a xen
> >> server.
> >> Other