[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
@Brad I have already sent the apport-collect from the host machine. I am not sure if I could apport-collect from the guest vm because on the guest I cannot access any network unless I restart the VM. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
yes, i've used the same virtual machines in both cases. everything is the same except the driver. I switch from virtio to e1000 glance image-update --property hw_vif_model=e1000 Now I relaunch the virtual cluster and everything works perfectly fine. I've tested with 5 times the load but it doesnt crash anymore. I have tested the same cluster configuration with virtio on Centos 6.5 kernel 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux and that crashes too. I will test the centos version with e1000 and see if that works. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
I changed the VM interface from virtio to e1000 and then I do not get this problem and the job finishes perfectly fine. Although e1000 may not be the best solution but at least it doesnt break my setup. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
@serge Yes. Thats correct. $ virsh reboot instance-name or a soft reboot from from inside the vm (while logged in through the serial console) does the trick. The VM is back online and accessible. And we can reproduce the process very easily. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
apport information ** Tags added: apport-collected precise third-party-packages ** Description changed: Networking breaks after awhile in kvm guests using virtio networking. We run data intensive jobs on our virtual cluster (OpenStack Grizzly Installed on Ubuntu 12.04 Server). The job runs fine on a single worker VM (no data transfer involved). As soon as I add more nodes where the workers need to exchange some data, one of the worker VM goes down. Ping responds with 'host unreachable'. Logging in via the serial console shows no problems: eth0 is up, can ping the local host, but no outside connectivity. Restart the network (/etc/init.d/networking restart) does nothing. Reboot the machine and it comes alive again. 14/06/01 18:30:06 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/06/01 18:30:06 INFO MemoryStore: ensureFreeSpace(190758) called with curMem=0, maxMem=308713881 14/06/01 18:30:06 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 186.3 KB, free 294.2 MB) 14/06/01 18:30:06 INFO FileInputFormat: Total input paths to process : 1 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.28:50010 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.23:50010 14/06/01 18:30:06 INFO SparkContext: Starting job: count at hello_spark.py:15 14/06/01 18:30:06 INFO DAGScheduler: Got job 0 (count at hello_spark.py:15) with 2 output partitions (allowLocal=false) 14/06/01 18:30:06 INFO DAGScheduler: Final stage: Stage 0 (count at hello_spark.py:15) 14/06/01 18:30:06 INFO DAGScheduler: Parents of final stage: List() 14/06/01 18:30:06 INFO DAGScheduler: Missing parents: List() 14/06/01 18:30:06 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count at hello_spark.py:15), which has no missing parents 14/06/01 18:30:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[2] at count at hello_spark.py:15) 14/06/01 18:30:07 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 14/06/01 18:30:08 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-28.novalocal:44417/user/Executor#-1352071582] with ID 1 14/06/01 18:30:08 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: host-10-20-20-28.novalocal (PROCESS_LOCAL) 14/06/01 18:30:08 INFO TaskSetManager: Serialized task 0.0:0 as 3123 bytes in 14 ms 14/06/01 18:30:09 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-28.novalocal:42960 with 588.8 MB RAM 14/06/01 18:30:16 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 in memory on host-10-20-20-28.novalocal:42960 (size: 308.2 MB, free: 280.7 MB) 14/06/01 18:30:17 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-23.novalocal:58126/user/Executor#1079893974] with ID 2 14/06/01 18:30:17 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2: host-10-20-20-23.novalocal (PROCESS_LOCAL) 14/06/01 18:30:17 INFO TaskSetManager: Serialized task 0.0:1 as 3123 bytes in 1 ms 14/06/01 18:30:17 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-23.novalocal:56776 with 588.8 MB RAM fj14/06/01 18:31:20 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, host-10-20-20-28.novalocal, 42960, 0) with no recent heart beats: 55828ms exceeds 45000ms 14/06/01 18:42:23 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so removing it 14/06/01 18:42:23 ERROR YarnClientClusterScheduler: Lost executor 2 on host-10-20-20-23.novalocal: remote Akka client disassociated The same job finishes flawlessly on a single worker. System Information: == Description: Ubuntu 12.04.4 LTS Release: 12.04 Linux 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC 2014 x86_64 libvirt-bin: -- Installed: 1.1.1-0ubuntu8~cloud2 Candidate: 1.1.1-0ubuntu8.7~cloud1 Version table: 1.1.1-0ubuntu8.7~cloud1 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.1.1-0ubuntu8~cloud2 0 100 /var/lib/dpkg/status 0.9.8-2ubuntu17.19 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 0.9.8-2ubuntu17.17 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 0.9.8-2ubuntu17 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages qemu-kvm: --- Installed: 1.5.0+dfsg-3ubuntu5~cloud0 Candidate: 1.5.0+dfsg-3ubuntu5.4~cloud0 Version table: 1.5.0+dfsg-3ubuntu5.4~cloud0 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.5.0+dfsg-3ubuntu5~cloud0 0
[Bug 1325560] ProcEnviron.txt
apport information ** Attachment added: "ProcEnviron.txt" https://bugs.launchpad.net/bugs/1325560/+attachment/4124283/+files/ProcEnviron.txt -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] Re: kvm vm loses network connectivity under "enough" load
** Also affects: qemu Importance: Undecided Status: New ** Also affects: libvirt Importance: Undecided Status: New ** No longer affects: qemu ** Also affects: linux Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/1325560 Title: kvm vm loses network connectivity under "enough" load To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1325560] [NEW] kvm vm loses network connectivity under "enough" load
Public bug reported: Networking breaks after awhile in kvm guests using virtio networking. We run data intensive jobs on our virtual cluster (OpenStack Grizzly Installed on Ubuntu 12.04 Server). The job runs fine on a single worker VM (no data transfer involved). As soon as I add more nodes where the workers need to exchange some data, one of the worker VM goes down. Ping responds with 'host unreachable'. Logging in via the serial console shows no problems: eth0 is up, can ping the local host, but no outside connectivity. Restart the network (/etc/init.d/networking restart) does nothing. Reboot the machine and it comes alive again. 14/06/01 18:30:06 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/06/01 18:30:06 INFO MemoryStore: ensureFreeSpace(190758) called with curMem=0, maxMem=308713881 14/06/01 18:30:06 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 186.3 KB, free 294.2 MB) 14/06/01 18:30:06 INFO FileInputFormat: Total input paths to process : 1 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.28:50010 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.23:50010 14/06/01 18:30:06 INFO SparkContext: Starting job: count at hello_spark.py:15 14/06/01 18:30:06 INFO DAGScheduler: Got job 0 (count at hello_spark.py:15) with 2 output partitions (allowLocal=false) 14/06/01 18:30:06 INFO DAGScheduler: Final stage: Stage 0 (count at hello_spark.py:15) 14/06/01 18:30:06 INFO DAGScheduler: Parents of final stage: List() 14/06/01 18:30:06 INFO DAGScheduler: Missing parents: List() 14/06/01 18:30:06 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count at hello_spark.py:15), which has no missing parents 14/06/01 18:30:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[2] at count at hello_spark.py:15) 14/06/01 18:30:07 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 14/06/01 18:30:08 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-28.novalocal:44417/user/Executor#-1352071582] with ID 1 14/06/01 18:30:08 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: host-10-20-20-28.novalocal (PROCESS_LOCAL) 14/06/01 18:30:08 INFO TaskSetManager: Serialized task 0.0:0 as 3123 bytes in 14 ms 14/06/01 18:30:09 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-28.novalocal:42960 with 588.8 MB RAM 14/06/01 18:30:16 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 in memory on host-10-20-20-28.novalocal:42960 (size: 308.2 MB, free: 280.7 MB) 14/06/01 18:30:17 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-23.novalocal:58126/user/Executor#1079893974] with ID 2 14/06/01 18:30:17 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2: host-10-20-20-23.novalocal (PROCESS_LOCAL) 14/06/01 18:30:17 INFO TaskSetManager: Serialized task 0.0:1 as 3123 bytes in 1 ms 14/06/01 18:30:17 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-23.novalocal:56776 with 588.8 MB RAM fj14/06/01 18:31:20 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, host-10-20-20-28.novalocal, 42960, 0) with no recent heart beats: 55828ms exceeds 45000ms 14/06/01 18:42:23 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so removing it 14/06/01 18:42:23 ERROR YarnClientClusterScheduler: Lost executor 2 on host-10-20-20-23.novalocal: remote Akka client disassociated The same job finishes flawlessly on a single worker. System Information: == Description: Ubuntu 12.04.4 LTS Release: 12.04 Linux 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC 2014 x86_64 libvirt-bin: -- Installed: 1.1.1-0ubuntu8~cloud2 Candidate: 1.1.1-0ubuntu8.7~cloud1 Version table: 1.1.1-0ubuntu8.7~cloud1 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.1.1-0ubuntu8~cloud2 0 100 /var/lib/dpkg/status 0.9.8-2ubuntu17.19 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 0.9.8-2ubuntu17.17 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 0.9.8-2ubuntu17 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages qemu-kvm: --- Installed: 1.5.0+dfsg-3ubuntu5~cloud0 Candidate: 1.5.0+dfsg-3ubuntu5.4~cloud0 Version table: 1.5.0+dfsg-3ubuntu5.4~cloud0 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.5.0+dfsg-3ubuntu5~cloud0 0 100 /var/lib/dpkg/status 1.0+noroms-0ubuntu14.15 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 1.0+noroms-0ubuntu14.14 0 500 http://security.ubuntu.com/ubuntu/ pr
[Bug 997978] Re: KVM images lose connectivity with bridged network
Yes, I think it is safe to say that the bug is still around. The VM loses network connectivity under "enough" load. I, for example can reproduce this by running a spark job which transfers a few gigabytes of data between worker VMs. And within a minute one of the VMs lose network connectivity. If I try to reboot the VM, it goes into error state. Trying to delete makes the qemu-kvm process defunct. uname -r 3.8.0-29-generic virsh --version 1.1.1 -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to qemu-kvm in Ubuntu. https://bugs.launchpad.net/bugs/997978 Title: KVM images lose connectivity with bridged network To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/997978/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs