Hi Synhrul,

Could you upload the /var/log/cloud.log ?

-Wei

2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir <sa...@nocser.net>:

> On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:
>
>> On 2016-12-19 17:03, Linas Žilinskas wrote:
>>
>>> From the logs it doesn't seem that the script timeouts. "Execution is
>>> successful", so it manages to pass the data over the socket.
>>>
>>> I guess the systemvm just doesn't configure itself for some reason.
>>>
>>
>> You are right, I was able to enter the router VM console at some point
>> during the timeout loops, and able to capture syslog output during the
>> loop:-
>>
>> http://pastebin.com/n37aHeSa
>>
>
> I restarted another network, and that network's router VM was able to be
> recreated, even on the same host as the failed network (and both networks
> are exactly same configuration, only VLAN & subnet are different).
> Comparing between the two syslog outputs during boot shows the problematic
> network router VM self-configuration got stuck in vm_dhcp_entry.json .
>
> 1. Working network router VM : http://pastebin.com/Y6zpDa6M
> 2. Non-working network router VM : http://pastebin.com/jzfGMGQB
>
> Thanks.
>
>
>
>> Also, in my personal tests, I noticed some different behaviour with
>>> different kernels. Don't remember the specifics right now, but on some
>>> combinations (qemu / kernel) the socket acted differently. For example
>>> the data was sent over the socket, but wasn't visible inside the VM.
>>> Other times the socket would be stuck from the host side.
>>>
>>> So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
>>> try to login to the system vm and see what's happening from inside.
>>>
>>
>> Will do this next and feedback the results here.
>>
>> Thanks for your help! :)
>>
>>
>> On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:
>>>
>>> On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
>>>> On Wed, 26 Oct 2016, Linas ?ilinskas wrote:
>>>>
>>>> So after some investigation I've found out that qemu 2.3.0 is indeed
>>>> broken, at least the way CS uses the qemu chardev/socket.
>>>>
>>>> Not sure in which specific version it happened, but it was fixed in
>>>> 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.
>>>>
>>>> qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338
>>>>
>>>> Also attaching the patch from that commit.
>>>>
>>>> For our own purposes i've included the patch to the qemu-kvm-ev
>>>> package (2.3.0) and all is well.
>>>>
>>>> Hi,
>>>>
>>>> I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
>>>> latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
>>>> package.
>>>>
>>>> The issue initially surfaced following a heartbeat-induced reset of
>>>> all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
>>>> qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
>>>> persisted for 1 out of 4 router VM/networks, even after upgrading to
>>>>
>>>> latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
>>>> and the patched code are pretty much still intact, as per the
>>>> 2.4.0-rc3 commit).
>>>>
>>>> Any help would be greatly appreciated.
>>>>
>>>> Thanks.
>>>>
>>>> (Attached are some debug logs from the host's agent.log)
>>>>
>>>
>>> Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ
>>>
>>> Thanks.
>>>
>>> --sazli
>>>>
>>>> On 2016-10-20 09:59, Linas ?ilinskas wrote:
>>>>
>>>> Hi.
>>>>
>>>> We have made an upgrade to 4.9.
>>>>
>>>> Custom build packages with our own patches, which in my mind (i'm
>>>> the only
>>>> one patching those) should not affect the issue i'll describe.
>>>>
>>>> I'm not sure whether we didn't notice it before, or it's actually
>>>> related
>>>> to something in 4.9
>>>>
>>>> Basically our system vm's were unable to be patched via the qemu
>>>> socket.
>>>> The script simply error'ed out with a timeout while trying to push
>>>> the
>>>> data to the socket.
>>>>
>>>> Executing it manually (with cmd line from the logs) resulted the
>>>> same. I
>>>> even tried the old perl variant, which also had same result.
>>>>
>>>> So finally we found out that this issue happens only on our HVs
>>>> which run
>>>> qemu 2.3.0, from the centos 7 special interest virtualization repo.
>>>> Other
>>>> ones that run qemu 1.5, from official repos, can patch the system
>>>> vms
>>>> fine.
>>>>
>>>> So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
>>>> Maybe it
>>>> something else special in our setup. e.g. we're running the HVs
>>>> from a
>>>> preconfigured netboot image (pxe), but all of them, including those
>>>> with
>>>> qemu 1.5, so i have no idea.
>>>>
>>>> Linas ?ilinskas
>>>> Head of Development
>>>> website <http://www.host1plus.com/> [1] facebook
>>>> <https://www.facebook.com/Host1Plus> [2] twitter
>>>> <https://twitter.com/Host1Plus> [3] linkedin
>>>> <https://www.linkedin.com/company/digital-energy-technologies-ltd.>
>>>> [4]
>>>>
>>>> Host1Plus is a division of Digital Energy Technologies Ltd.
>>>>
>>>> 26 York Street, London W1U 6PZ, United Kingdom
>>>>
>>>>
> --
> --sazli
>

Reply via email to