Re: [gentoo-user] Networking trouble
J. Roeleveld wrote: Quick reply from mobile. Will give a more detailed one later. Noticed you are using ZFS. Where is your swap partition located? On ZFS or? Swap for dom0 is on a mdraid partition. Dom0 has 4GB RAM because it's supposed to be used for making backups once I get to set that up and is not swapping.
Re: [gentoo-user] Networking trouble
On 29 October 2015 11:29:18 CET, hwwrote: >J. Roeleveld wrote: >> On Thursday, October 15, 2015 05:46:07 PM hw wrote: >>> J. Roeleveld wrote: On Thursday, October 15, 2015 03:30:01 PM hw wrote: > Hi, > > I have a xen host with some HV guests which becomes unreachable >via > the network after apparently random amount of times. I have >already > switched the network card to see if that would make a difference, > and with the card currently installed, it worked fine for over 20 >days > until it become unreachable again. Before switching the network >card, > it would run a week or two before becoming unreachable. The >previous > card was the on-board BCM5764M which uses the tg3 driver. > > There are messages like this in the log file: > > > Oct 14 20:58:02 moonflo kernel: [ cut here >] > Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 >20:58:02 > moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 >timed > out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb >md4 hmac > nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter >ip_tables > xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight > drm_kms_helper > ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm >snd_timer snd > soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul >ablk_helper > cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd >usb_storage > ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo >kernel: CPU: > 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo >#3 Oct > 14 > 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 >moonflo > kernel: 8175a77d 880124d43d98 814da8d8 > 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 > 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 > moonflo > kernel: 8800d45f2000 0001 > 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > Oct 14 20:58:02 moonflo kernel:[] > dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: > [] > warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 >moonflo > kernel: [] ? >add_interrupt_randomness+0x35/0x1e0 Oct > 14 > 20:58:02 moonflo kernel: [] >dev_watchdog+0x259/0x270 > Oct > 14 20:58:02 moonflo kernel: [] ? > dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 >moonflo > kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct >14 > 20:58:02 moonflo kernel: [] > run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 >moonflo > kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 >moonflo > kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct >14 > 20:58:02 moonflo kernel: [] > xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo >kernel: > > >[] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 >20:58:02 > > moonflo kernel: [] ? >xen_hypercall_sched_op+0xa/0x20 > Oct > 14 20:58:02 moonflo kernel: [] ? > xen_safe_halt+0x10/0x20 > Oct 14 20:58:02 moonflo kernel: [] ? > default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: > [] > ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 >20:58:02 > moonflo kernel: [] ? >cpu_bringup_and_idle+0x25/0x40 > Oct > 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- >Oct 14 > 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up > > > After that, there are lots of messages about the link being up, >one > message > every 12 seconds. When you unplug the network cable, you get a >message > that the link is down, and no message when you plug it in again. > > I was hoping that switching the network card (to one that uses a > different > driver) might solve the problem, and it did not. Now I can only >guess > that > the network card goes to sleep and sometimes cannot be woken up >again. > > I tried to reduce the connection speed to 100Mbit and found that > accessing > the VMs (via RDP) becomes too slow to use them. So I disabled the >power > management of the network card (through sysfs) and will have to >see if > the > problem persists. > > We'll be getting decent network cards in a couple days, but since >the > problem
Re: [gentoo-user] Networking trouble
J. Roeleveld wrote: On Thursday, October 15, 2015 05:46:07 PM hw wrote: J. Roeleveld wrote: On Thursday, October 15, 2015 03:30:01 PM hw wrote: Hi, I have a xen host with some HV guests which becomes unreachable via the network after apparently random amount of times. I have already switched the network card to see if that would make a difference, and with the card currently installed, it worked fine for over 20 days until it become unreachable again. Before switching the network card, it would run a week or two before becoming unreachable. The previous card was the on-board BCM5764M which uses the tg3 driver. There are messages like this in the log file: Oct 14 20:58:02 moonflo kernel: [ cut here ] Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo #3 Oct 14 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo kernel: 8175a77d 880124d43d98 814da8d8 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo kernel: 8800d45f2000 0001 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: Oct 14 20:58:02 moonflo kernel:[] dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 20:58:02 moonflo kernel: [] run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 20:58:02 moonflo kernel: [] xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_safe_halt+0x10/0x20 Oct 14 20:58:02 moonflo kernel: [] ? default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [] ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 Oct 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up After that, there are lots of messages about the link being up, one message every 12 seconds. When you unplug the network cable, you get a message that the link is down, and no message when you plug it in again. I was hoping that switching the network card (to one that uses a different driver) might solve the problem, and it did not. Now I can only guess that the network card goes to sleep and sometimes cannot be woken up again. I tried to reduce the connection speed to 100Mbit and found that accessing the VMs (via RDP) becomes too slow to use them. So I disabled the power management of the network card (through sysfs) and will have to see if the problem persists. We'll be getting decent network cards in a couple days, but since the problem doesn't seem to be related to a particular card/model/manufacturer, that might not fix it, either. This problem seems to only occur on machines that operate as a xen server. Other machines, identical Z800s, not running xen, run just fine. What would you suggest? More info required: - Which version of Xen 4.5.1 Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug -efi -flask -xsm) Ok, recent one. - Does this only occur with HVM guests? The host has been running only HVM guests every time it happend. It was running a PV guest in between (which I had to shut down because other VMs were migrated, requiring the RAM). The PV didn't
Re: [gentoo-user] Networking trouble
J. Roeleveld wrote: On Thursday, October 15, 2015 03:30:01 PM hw wrote: Hi, I have a xen host with some HV guests which becomes unreachable via the network after apparently random amount of times. I have already switched the network card to see if that would make a difference, and with the card currently installed, it worked fine for over 20 days until it become unreachable again. Before switching the network card, it would run a week or two before becoming unreachable. The previous card was the on-board BCM5764M which uses the tg3 driver. There are messages like this in the log file: Oct 14 20:58:02 moonflo kernel: [ cut here ] Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo #3 Oct 14 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo kernel: 8175a77d 880124d43d98 814da8d8 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo kernel: 8800d45f2000 0001 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: Oct 14 20:58:02 moonflo kernel:[] dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 20:58:02 moonflo kernel: [] run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 20:58:02 moonflo kernel: [] xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_safe_halt+0x10/0x20 Oct 14 20:58:02 moonflo kernel: [] ? default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [] ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 Oct 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up After that, there are lots of messages about the link being up, one message every 12 seconds. When you unplug the network cable, you get a message that the link is down, and no message when you plug it in again. I was hoping that switching the network card (to one that uses a different driver) might solve the problem, and it did not. Now I can only guess that the network card goes to sleep and sometimes cannot be woken up again. I tried to reduce the connection speed to 100Mbit and found that accessing the VMs (via RDP) becomes too slow to use them. So I disabled the power management of the network card (through sysfs) and will have to see if the problem persists. We'll be getting decent network cards in a couple days, but since the problem doesn't seem to be related to a particular card/model/manufacturer, that might not fix it, either. This problem seems to only occur on machines that operate as a xen server. Other machines, identical Z800s, not running xen, run just fine. What would you suggest? More info required: - Which version of Xen 4.5.1 Installed versions: 4.5.1^t(02:44:35 PM 07/14/2015)(-custom-cflags -debug -efi -flask -xsm) - Does this only occur with HVM guests? The host has been running only HVM guests every time it happend. It was running a PV guest in between (which I had to shut down because other VMs were migrated, requiring the RAM). - Which network-driver are you using inside the guest r8169, compiled as a module Same happened with
[gentoo-user] Networking trouble
Hi, I have a xen host with some HV guests which becomes unreachable via the network after apparently random amount of times. I have already switched the network card to see if that would make a difference, and with the card currently installed, it worked fine for over 20 days until it become unreachable again. Before switching the network card, it would run a week or two before becoming unreachable. The previous card was the on-board BCM5764M which uses the tg3 driver. There are messages like this in the log file: Oct 14 20:58:02 moonflo kernel: [ cut here ] Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo #3 Oct 14 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo kernel: 8175a77d 880124d43d98 814da8d8 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo kernel: 8800d45f2000 0001 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: Oct 14 20:58:02 moonflo kernel:[] dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 20:58:02 moonflo kernel: [] run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 20:58:02 moonflo kernel: [] xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel:[] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 moonflo kernel: [] ? xen_safe_halt+0x10/0x20 Oct 14 20:58:02 moonflo kernel: [] ? default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [] ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 Oct 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up After that, there are lots of messages about the link being up, one message every 12 seconds. When you unplug the network cable, you get a message that the link is down, and no message when you plug it in again. I was hoping that switching the network card (to one that uses a different driver) might solve the problem, and it did not. Now I can only guess that the network card goes to sleep and sometimes cannot be woken up again. I tried to reduce the connection speed to 100Mbit and found that accessing the VMs (via RDP) becomes too slow to use them. So I disabled the power management of the network card (through sysfs) and will have to see if the problem persists. We'll be getting decent network cards in a couple days, but since the problem doesn't seem to be related to a particular card/model/manufacturer, that might not fix it, either. This problem seems to only occur on machines that operate as a xen server. Other machines, identical Z800s, not running xen, run just fine. What would you suggest?
Re: [gentoo-user] Networking trouble
On Thursday, October 15, 2015 03:30:01 PM hw wrote: > Hi, > > I have a xen host with some HV guests which becomes unreachable via > the network after apparently random amount of times. I have already > switched the network card to see if that would make a difference, > and with the card currently installed, it worked fine for over 20 days > until it become unreachable again. Before switching the network card, > it would run a week or two before becoming unreachable. The previous > card was the on-board BCM5764M which uses the tg3 driver. > > There are messages like this in the log file: > > > Oct 14 20:58:02 moonflo kernel: [ cut here ] > Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 > moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed > out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac > nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables > xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper > ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd > soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper > cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage > ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: > 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo #3 Oct 14 > 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo > kernel: 8175a77d 880124d43d98 814da8d8 > 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 > 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 moonflo > kernel: 8800d45f2000 0001 > 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > Oct 14 20:58:02 moonflo kernel:[] > dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [] > warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo > kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 > 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 Oct > 14 20:58:02 moonflo kernel: [] ? > dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo > kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 > 20:58:02 moonflo kernel: [] > run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo > kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo > kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 > 20:58:02 moonflo kernel: [] > xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: > [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 > moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 Oct > 14 20:58:02 moonflo kernel: [] ? xen_safe_halt+0x10/0x20 > Oct 14 20:58:02 moonflo kernel: [] ? > default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [] > ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 > moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 Oct > 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 > 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up > > > After that, there are lots of messages about the link being up, one message > every 12 seconds. When you unplug the network cable, you get a message that > the link is down, and no message when you plug it in again. > > I was hoping that switching the network card (to one that uses a different > driver) might solve the problem, and it did not. Now I can only guess that > the network card goes to sleep and sometimes cannot be woken up again. > > I tried to reduce the connection speed to 100Mbit and found that accessing > the VMs (via RDP) becomes too slow to use them. So I disabled the power > management of the network card (through sysfs) and will have to see if the > problem persists. > > We'll be getting decent network cards in a couple days, but since the > problem doesn't seem to be related to a particular card/model/manufacturer, > that might not fix it, either. > > This problem seems to only occur on machines that operate as a xen server. > Other machines, identical Z800s, not running xen, run just fine. > > What would you suggest? More info required: - Which version of Xen - Does this only occur with HVM guests? - Which network-driver are you using inside the guest - Can you connect to the "local" console of the guest? - If yes, does it still have no connectivity? I saw the same on my lab machine, which was related to: - Not using correct drivers inside
Re: [gentoo-user] Networking trouble
On Thursday, October 15, 2015 05:46:07 PM hw wrote: > J. Roeleveld wrote: > > On Thursday, October 15, 2015 03:30:01 PM hw wrote: > >> Hi, > >> > >> I have a xen host with some HV guests which becomes unreachable via > >> the network after apparently random amount of times. I have already > >> switched the network card to see if that would make a difference, > >> and with the card currently installed, it worked fine for over 20 days > >> until it become unreachable again. Before switching the network card, > >> it would run a week or two before becoming unreachable. The previous > >> card was the on-board BCM5764M which uses the tg3 driver. > >> > >> There are messages like this in the log file: > >> > >> > >> Oct 14 20:58:02 moonflo kernel: [ cut here ] > >> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at > >> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 > >> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed > >> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac > >> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables > >> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau > >> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) > >> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight > >> drm_kms_helper > >> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd > >> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper > >> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage > >> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: > >> 10 PID: 0 Comm: swapper/10 Tainted: P O4.0.5-gentoo #3 Oct > >> 14 > >> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 > >> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo > >> kernel: 8175a77d 880124d43d98 814da8d8 > >> 0001 Oct 14 20:58:02 moonflo kernel: 880124d43de8 > >> 880124d43dd8 81088850 880124d43dd8 Oct 14 20:58:02 > >> moonflo > >> kernel: 8800d45f2000 0001 > >> 8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: > >> Oct 14 20:58:02 moonflo kernel:[] > >> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: > >> [] > >> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: > >> [] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo > >> kernel: [] ? add_interrupt_randomness+0x35/0x1e0 Oct > >> 14 > >> 20:58:02 moonflo kernel: [] dev_watchdog+0x259/0x270 > >> Oct > >> 14 20:58:02 moonflo kernel: [] ? > >> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: > >> [] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo > >> kernel: [] call_timer_fn.isra.30+0x17/0x70 Oct 14 > >> 20:58:02 moonflo kernel: [] > >> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: > >> [] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo > >> kernel: [] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo > >> kernel: [] xen_evtchn_do_upcall+0x35/0x50 Oct 14 > >> 20:58:02 moonflo kernel: [] > >> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: > >> > >> > >> [] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 > >> > >> moonflo kernel: [] ? xen_hypercall_sched_op+0xa/0x20 > >> Oct > >> 14 20:58:02 moonflo kernel: [] ? > >> xen_safe_halt+0x10/0x20 > >> Oct 14 20:58:02 moonflo kernel: [] ? > >> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: > >> [] > >> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: > >> [] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 > >> moonflo kernel: [] ? cpu_bringup_and_idle+0x25/0x40 > >> Oct > >> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 > >> 20:58:02 moonflo kernel: r8169 :37:04.0 enp55s4: link up > >> > >> > >> After that, there are lots of messages about the link being up, one > >> message > >> every 12 seconds. When you unplug the network cable, you get a message > >> that the link is down, and no message when you plug it in again. > >> > >> I was hoping that switching the network card (to one that uses a > >> different > >> driver) might solve the problem, and it did not. Now I can only guess > >> that > >> the network card goes to sleep and sometimes cannot be woken up again. > >> > >> I tried to reduce the connection speed to 100Mbit and found that > >> accessing > >> the VMs (via RDP) becomes too slow to use them. So I disabled the power > >> management of the network card (through sysfs) and will have to see if > >> the > >> problem persists. > >> > >> We'll be getting decent network cards in a couple days, but since the > >> problem doesn't seem to be related to a particular > >> card/model/manufacturer, > >> that might not fix it, either. > >> > >> This problem seems to only occur on machines that operate as a xen > >> server. > >> Other