Hi Synhrul, Could you upload the /var/log/cloud.log ?
-Wei 2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir <sa...@nocser.net>: > On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote: > >> On 2016-12-19 17:03, Linas Žilinskas wrote: >> >>> From the logs it doesn't seem that the script timeouts. "Execution is >>> successful", so it manages to pass the data over the socket. >>> >>> I guess the systemvm just doesn't configure itself for some reason. >>> >> >> You are right, I was able to enter the router VM console at some point >> during the timeout loops, and able to capture syslog output during the >> loop:- >> >> http://pastebin.com/n37aHeSa >> > > I restarted another network, and that network's router VM was able to be > recreated, even on the same host as the failed network (and both networks > are exactly same configuration, only VLAN & subnet are different). > Comparing between the two syslog outputs during boot shows the problematic > network router VM self-configuration got stuck in vm_dhcp_entry.json . > > 1. Working network router VM : http://pastebin.com/Y6zpDa6M > 2. Non-working network router VM : http://pastebin.com/jzfGMGQB > > Thanks. > > > >> Also, in my personal tests, I noticed some different behaviour with >>> different kernels. Don't remember the specifics right now, but on some >>> combinations (qemu / kernel) the socket acted differently. For example >>> the data was sent over the socket, but wasn't visible inside the VM. >>> Other times the socket would be stuck from the host side. >>> >>> So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or >>> try to login to the system vm and see what's happening from inside. >>> >> >> Will do this next and feedback the results here. >> >> Thanks for your help! :) >> >> >> On 12/16/16 03:46, Syahrul Sazli Shaharir wrote: >>> >>> On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote: >>>> On Wed, 26 Oct 2016, Linas ?ilinskas wrote: >>>> >>>> So after some investigation I've found out that qemu 2.3.0 is indeed >>>> broken, at least the way CS uses the qemu chardev/socket. >>>> >>>> Not sure in which specific version it happened, but it was fixed in >>>> 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working. >>>> >>>> qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338 >>>> >>>> Also attaching the patch from that commit. >>>> >>>> For our own purposes i've included the patch to the qemu-kvm-ev >>>> package (2.3.0) and all is well. >>>> >>>> Hi, >>>> >>>> I am facing the exact same issue on latest Cloudstack 4.9.0.1, on >>>> latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7 >>>> package. >>>> >>>> The issue initially surfaced following a heartbeat-induced reset of >>>> all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock >>>> qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts >>>> persisted for 1 out of 4 router VM/networks, even after upgrading to >>>> >>>> latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source, >>>> and the patched code are pretty much still intact, as per the >>>> 2.4.0-rc3 commit). >>>> >>>> Any help would be greatly appreciated. >>>> >>>> Thanks. >>>> >>>> (Attached are some debug logs from the host's agent.log) >>>> >>> >>> Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ >>> >>> Thanks. >>> >>> --sazli >>>> >>>> On 2016-10-20 09:59, Linas ?ilinskas wrote: >>>> >>>> Hi. >>>> >>>> We have made an upgrade to 4.9. >>>> >>>> Custom build packages with our own patches, which in my mind (i'm >>>> the only >>>> one patching those) should not affect the issue i'll describe. >>>> >>>> I'm not sure whether we didn't notice it before, or it's actually >>>> related >>>> to something in 4.9 >>>> >>>> Basically our system vm's were unable to be patched via the qemu >>>> socket. >>>> The script simply error'ed out with a timeout while trying to push >>>> the >>>> data to the socket. >>>> >>>> Executing it manually (with cmd line from the logs) resulted the >>>> same. I >>>> even tried the old perl variant, which also had same result. >>>> >>>> So finally we found out that this issue happens only on our HVs >>>> which run >>>> qemu 2.3.0, from the centos 7 special interest virtualization repo. >>>> Other >>>> ones that run qemu 1.5, from official repos, can patch the system >>>> vms >>>> fine. >>>> >>>> So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x? >>>> Maybe it >>>> something else special in our setup. e.g. we're running the HVs >>>> from a >>>> preconfigured netboot image (pxe), but all of them, including those >>>> with >>>> qemu 1.5, so i have no idea. >>>> >>>> Linas ?ilinskas >>>> Head of Development >>>> website <http://www.host1plus.com/> [1] facebook >>>> <https://www.facebook.com/Host1Plus> [2] twitter >>>> <https://twitter.com/Host1Plus> [3] linkedin >>>> <https://www.linkedin.com/company/digital-energy-technologies-ltd.> >>>> [4] >>>> >>>> Host1Plus is a division of Digital Energy Technologies Ltd. >>>> >>>> 26 York Street, London W1U 6PZ, United Kingdom >>>> >>>> > -- > --sazli >