[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
It should be noted that after switching to using Ubuntu Cloud Archive which includes newer libvirt this issue went away in the gate. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
** Changed in: nova Status: Confirmed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt - After few days stress, the issue cat not be reproduced by the script I provide. So I fallback to use a more simple script, which I use at the beginning to reproduce the issue in my environment, a script just start/stop instance and sleep (without vm status check). After change to use this script, the issue is reproducible (but require few days in my test), with libvirt-bin(1.3.1-1ubuntu10.8) and qemu(1:2.5 +dfsg-5ubuntu10.11) and 20 cirros guest. The real instance behavior is more complex in the simple script, because sometimes the openstack api will encounter conflict error due to vm status is not match Cannot 'stop' instance 971fe132-55de-4ff7-b1e8-c556390964c1 while it is in vm_state stopped (HTTP 409) Cannot 'start' instance 01eae3e9-6330-4594-a1dd-dcdbf3a4d392 while it is in vm_state active (HTTP 409) So some instance may be start/stop in the same time, due to some action may not be apply on previous command, and some pattern seems trigger libvirt crash. However, due to the behavior is very complex, so I'm not sure if the issue is reproducible on other environment. For example, if sleep time is long enough to let all vm finish their task, than there will no conflict error, and the script behavior will like the script I provide behavior, which can not reproduce the issue. On the other hand, if the sleep time is too short or too much guest, the openstack environment may not able to handle the request rate, which may lead some other client or server error ** Attachment added: "crash-loop.sh" https://bugs.launchpad.net/nova/+bug/1643911/+attachment/4877202/+files/crash-loop.sh -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
At least on my end it ran fine for ~7 days now with 20 guests. Looking forward to hear what you might find being the hidden trigger. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt - In the new deployed environment (with libvirt 1.3.1-1ubuntu10.8), the libvirt is working fine for more than 24 hours. The stress will keep running for more two days to check if the issue is reproducible. However, so far I think I may missing some key point, so the new environment can't reproduce the issue. I will do more test and update here if I have any new finding :) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt, I stop stress script after stress PPA-2619 for 7 days without issue, the libvirt still work fine, but there are no stress loading, so I will start stress script to stress PPA-2619 again. On the other hand, due to I can reproduce this issue in my environment, so I will prepare another similar environment to reproduce this issue with libvirt 1.3.1-1ubuntu10.8, to see if the issue is still reproducible. If the issue is reproducible in the similar environment, I will try to use virsh command to reproduce the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
After about a day my keystone and percona died for exceeding the limited size of the system I had. But not libvirt/qemu crash/fail so far. I need to look into it again at more detail if I can get it to hold longer until the libvirt issue occurs. @Davidchen is for you the -19 ppa still (would be a week more now I guess) running fine while the other ppa and original libvirt failed reproducible? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt - In my test environment, I have encounter this issue with different flavor and image so I think use different image and flavor to reproduce is ok :) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
After slightly more than three days without crash of which almost exactly a day of pure cpu cycles were spent in libvirt I start to think that this won't trigger the bug as I hoped. I deployed a new openstack and now have a loop running based on openstack start/stop (using 10x m1.small as I have slightly larger cloudimages ont cirros for now). So this is more like your case just slightly enhanced by my time measurements - hope to trigger it on my end with this. In the worst case after the weekend I'll have to fetch cirros and make nano instances to be even more similar. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Thanks Davidchen, FYI now running ~48 hours being on ruond 3511 atm and continuing. So far no crash yet - same libvirt PID still running and no logs/crashdumps/... I contacted a few people to consider an openstack setup on my test node just in case starting/stopping it via that might - other than I assumed - be involved still. If this continues to run just fine I'll at some point abort and set an OS up there. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt - I chose m1.nano because nano just the lowest and easiest flavor to use. My test system have about 20GB memory. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Ok, thank you - my test is still running fine but you reported 2-12 hours so I'll give it at least a few days. It also came to my mind - 5 x m1.nano which default to 5x64MB = 320MB. In case it might be related due to fragmentation or overall memory shortage. Is your system low on memory or was a nano just the lowest and easiest flavor to use? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Hi ChristianEhrhardt, You're welcome :) My test environment do not have any other workload or guest vm, only 5 cirros instance so I guess parallel start/stop instance is the reason trigger libvirt crash -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@Davidchen - just to clarify details is on your system any other work going on, other guests or virtualization activity that might influence this? I almost assumed that we need the bigger version that was in PPA-2619 (which is why I created it right away) - big thanks for verifying both PPAs for your case! -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Hi Davidchen - Thank you to provide the info on your repro: m1.nano is really small, I'd hope that with a slightly bigger size of 512M and 10 instead of 5 systems I might crash it earlier. If not triggering I will modify to follow the smaller sizes you had. I wanted to get openstack out of the equation here, since in these cases it doesn't do a lot more than forwarding commands which hopefully works. Also I wonder why that would trigger the issue when the fixes are around "asynchronizing" blockjobs, but I take every chance at a test - maybe I overlook such a job at startup/shutdown. I have started a slightly modified version and will let it run for a few days - hopefully hitting the issue as well. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Note: test script based on Davidchen's suggestion but without openstack and converted into an infinite run, now running on Horsea ** Attachment added: "crash-loop.sh" https://bugs.launchpad.net/nova/+bug/1643911/+attachment/4870727/+files/crash-loop.sh -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@ChristianEhrhardt: I can success reproduce this issue on my environment, the issue can be reproduced by stop and start multiple instance parallelly (in my case is 5 cirros instance) I have try both PPA, the PPA-2619 fix this issue, stress 7 days and libvirt work fine, but PPA-2620 still hit libvirt crash issue Below is my test script, the time to hit the issue is about 2 hour ~ 12 hour, libvirt will crash due to memory corruption vm instance is created by command "openstack server create --image cirros --flavor m1.nano cirrosX", X=1~5 #!/bin/bash TEST_ROUND=3000 for round in `seq 1 1 $TEST_ROUND`; do echo "test round $round ..." for i in `seq 1 1 5`; do openstack server stop cirros$i & done sleep 1 for i in `seq 1 1 5`; do while true; do STATUS=`openstack server show cirros$i -f value -c OS-EXT-STS:power_state` if [ "$STATUS" == "Shutdown" ]; then break; fi sleep 1 done done for i in `seq 1 1 5`; do openstack server start cirros$i & done sleep 1 for i in `seq 1 1 5`; do while true; do STATUS=`openstack server show cirros$i -f value -c OS-EXT-STS:power_state` if [ "$STATUS" == "Running" ]; then break; fi sleep 1 done done done -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Hi (cross post I know), there is a bit of a "somewhat dup" context around the list of the following bugs: - bug 1646779 - bug 1643911 - bug 1638982 - bug 1673483 Unfortunately, I’ve never hit these bugs in any of my libvirt/qemu runs and these are literally thousands every day due to all the automated tests. I also checked with our Ubuntu OpenStack Team and they didn't see them so far either. That makes it hard to debug them in depth as you will all understand. But while the “signature” on each bug is different, they share a lot of things still (lets call it the bug "meta signature"): - there is no way to recreate yet other than watching gate test crash statistics - they seem to haunt the openstack gate testing more than anything else - most are directly or indirectly related to memory corruption As I’m unable to reproduce any of these bugs myself, I’d like to get some help from anyone that can help to recreate. Therefore I ask all people affected (mostly the same on all these bugs anyway) to test the PPAs I created for bug 1673483. That is the one bug where thanks to the great help of Kashyp, Matt and Dan (and more) at least a potential fix was identified. That means two ppa's for you to try: - Backport of the minimal recommended fix: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2620 - Backport of the full series of related fixes: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2619 Especially since the potential error of these fixes refers to almost anything from memleak to deadlock there is a chance that they all might be caused by the same root cause. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
@Matt Booth: This is not the same bug 1673483 that DanB debugged the other day and identified fixes, as the Nova stacktraces are different for both. For bug 1673483, the Nova crash directly relates to the libvirt commits mentioned in its comment #5 (of bug 1673483). In this memory corruption bug, we don't quite now what is the root cause yet, that's why we didn't close it as a duplicate. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
If it is the other one is bug 1673483 with test builds of a wide and a more narrow backport of the mentioned fixes in comment #5 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Is this the same bug we saw the other week? I thought Dan B had found a couple of patches missing from the libvirt shipped in Ubuntu which are likely candidates for fixing this. Kashyap, do you have those to hand? I don't think we're likely to get much traction on this from upstream libvirt because: * This is not the latest stable release of libvirt. * They looked anyway, and they think it's fixed in the latest stable release of libvirt. IOW it's a downstream Ubuntu issue. We need to report this as a bug to Canonical in their package, providing the 2 patches recommended by upstream. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
(i just sent this to the list, but putting here too) While I agree that a coredump is not that likely to help, I would also like to come to that conclusion after inspecting a coredump :) I've found things in the heap before that give clues as to what real problems are. To this end, I've proposed [2] to keep coredumps. It's a little hackish but I think gets the job done. [3] enables this and saves any dumps to the logs in d-g. As suggested, running under valgrind would be great but probably impractical until we narrow it down a little. Another thing I've had some success with is electric fence [4] which puts boundaries around allocations so out-of-bounds access hits at the time of access. I've proposed [5] to try this out, but it's not looking particularly promising unfortunately. I'm open to suggestions, for example maybe something like tcalloc might give us a different failure and could be another clue. If we get something vaguely reliable here, our best bet might be to run a parallel non-voting job on all changes to see what we can pick up. [1] https://bugs.launchpad.net/nova/+bug/1643911 [2] https://review.openstack.org/451128 [3] https://review.openstack.org/451219 [4] http://elinux.org/Electric_Fence [5] https://review.openstack.org/451136 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
valgrind would be great, but is the 100-pound gorilla approach. I'll play with maybe some lighter-weight things like electric fence which could give us some insight. something like that is going to segfault so we cores seem a top priority. I'm probably more optimistic about general usefulness of cores anyway ... sometimes in my experience dumping bits of the heap you can see strings or other things that give you a clue as to what's going on. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
We've also somewhat recently gotten the OOMkiller problems to go away. And yet these problems remain. I doubt it is related to OOMKiller. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
I faced the same calltrace on http://logs.openstack.org/59/426459/4/check/gate-tempest-dsvm-neutron- full-ubuntu-xenial/0222b58/logs The backtrace of libvirtd is *** Error in `/usr/sbin/libvirtd': malloc(): memory corruption: 0x5560f082e8d0 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fd02d8337e5] /lib/x86_64-linux-gnu/libc.so.6(+0x8181e)[0x7fd02d83d81e] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7fd02d83f5d4] /lib/x86_64-linux-gnu/libc.so.6(realloc+0x358)[0x7fd02d83fe68] /lib/x86_64-linux-gnu/libc.so.6(+0xe53a2)[0x7fd02d8a13a2] /lib/x86_64-linux-gnu/libc.so.6(regexec+0xb3)[0x7fd02d8a39c3] /usr/lib/x86_64-linux-gnu/libvirt.so.0(virLogProbablyLogMessage+0x1f)[0x7fd02e241a5f] If this is due to oom-killer, syslog should contain oom-killer happened clearly. However, the syslog doesn't contain it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
FYI Armando suspects that this failure is a result of general high memory consumption in gate, something that lingers all projects: http://lists.openstack.org/pipermail/openstack- dev/2017-February/111413.html -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
Now, we seem to be stuck in a limbo here, unable to diagnose this to get to the root cause. So I asked upstream libvirt maintainers on IRC. And Dan Berrange responds [text formatted a little bit for readability here]: "Running libvirt under Valgrind will likely point to a root cause. However, it's impossible to run libvirtd under Valgrind in the openstack CI system, unless you're happy to have many hours longer running time and massively more RAM used. "The only way to debug it is to deploy custom libvirtd builds. Meaning: whatever extra debugging info is needed in the area of code that is suspected to be broken; there's no right answer here - you just have to experiment repeatedly until you find what you need. And deploy this custom build either by providing new packages in the the repos, or by using a hack [via 'rootwrap' facility] to install custom libvirtd in the Nova startup code. "Also, a core dump in this scenario will not be helpful. With memory corruption, a core dump is rarely useful, because the actual problem you care about will have occurred some time before the crash happens. This is especially true for multithreaded applications like libvirtd. Because the thread showing the abrt/segv is quite often not the thread which caused the corruption." -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
There seem to be 6 people+ hitting it -> marking confirmed ** Changed in: libvirt (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1643911] Re: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:"
** Also affects: libvirt (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1643911 Title: libvirt randomly crashes on xenial nodes with "*** Error in `/usr/sbin/libvirtd': malloc(): memory corruption:" To manage notifications about this bug go to: https://bugs.launchpad.net/libvirt/+bug/1643911/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs