Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow <casasferna...@hotmail.com> wrote: > Hi Ladi, > > Sorry for the delay in my reply. > I will leave the host kernel alone for now then. > For the last 15 hours or so I'm running memtest86+ on the host. So far so > good. Two passes no errors so far. I will try to leave it running for at > least another 24hr and report back the results. Hopefully we can discard the > memory issue at hardware level. > > Regarding KSM, that will be the next thing I will disable if after removing > the balloon device guests still crash. > > About leaving a guest in a failed state for you to debug it remotely, that's > absolutely an option. We just need to coordinate so I can give you remote > access to the host and so on. Let me know if any preparation is needed in > advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the VM gets into the broken state please do something like: gdbserver --attach host:12345 <QEMU pid> and let me know the host name and port (12345 in the above example). > Once I again I would like to thank you for all your help and your great > disposition! You're absolutely welcome, I don't think I've done anything helpful so far :) > Cheers, > > Fer > > On mar, jun 20, 2017 at 9:52 , Ladi Prosek <lpro...@redhat.com> wrote: > > The host kernel is less likely to be responsible for this, in my opinion. > I'd hold off on that for now. > > And last but not least KSM is enabled on the host. Should I disable it? > > Could be worth the try. > > Following your advice I will run memtest on the host and report back. Just > as a side comment, the host is running on ECC memory. > > I see. Would it be possible for you, once a guest is in the broken state, to > make it available for debugging? By attaching gdb to the QEMU process for > example and letting me poke around it remotely? Thanks! > > >