[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-25 Thread Launchpad Bug Tracker
This bug was fixed in the package libvirt - 6.0.0-0ubuntu8.20

---
libvirt (6.0.0-0ubuntu8.20) focal; urgency=medium

  * d/p/u/lp2059272-2-qemu-Wait-qemuProcessReconnect-threads-in-cleanup.patch:
Remove patch. It is not possible to wait for qemuProcessReconnect()
in cleanup: it talks to QEMU monitor, which blocks on replies from
event loop, but it's already stopped at cleanup, delaying shutdown.

  * d/p/u/lp2059272-2-qemu-Do-not-save-XML-in-shutdown-on-init.patch:
Instead of waiting at cleanup for threads which might be blocked
thus would _not even reach_ the function that causes the problem,
just skip that function if it is _actually reached_ while daemon
shutdown is in progress. That is in the init path and would just
run again anyway the next time libvirtd is started (LP: #2059272)

  * NOTE: This package contains the changes from 6.0.0-0ubuntu8.18 and
6.0.0-0ubuntu8.17 in focal-proposed (with symbolic changelog entry)
superseded by 6.0.0-0ubuntu8.19 in focal-security.

libvirt (6.0.0-0ubuntu8.20~ubuntu8.18) focal; urgency=medium

  * d/p/u/lp2059272-1-qemu-Fix-potential-crash-during-driver-cleanup.patch:
On QEMU driver cleanup, release (stop) the worker thread pool _first_,
before other data used by possibly running worker threads (LP: #2059272)

  * d/p/u/lp2059272-2-qemu-Wait-qemuProcessReconnect-threads-in-cleanup.patch:
On QEMU driver cleanup, also wait for qemuProcessReconnect() threads,
as they are independent of the worker thread pool. (LP: #2059272)
Focal needs this as it has no .stateShutdownWait() callback yet.
(The wait timeout is set in LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT:
 -1 = wait indefinitely; 0 = do not wait; N = wait up to N seconds.)

libvirt (6.0.0-0ubuntu8.20~ubuntu8.17) focal; urgency=medium

  * d/p/u/lp-1989078-*.patch: allow arm64 to lock its OVMF/AAVMF resources
(LP: #1989078)

 -- Mauricio Faria de Oliveira   Tue, 16 Apr 2024
14:20:13 -0300

** Changed in: libvirt (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-24 Thread Mauricio Faria de Oliveira
The packages in focal-proposed have also been verified successfully in
real-world/non-synthetic tests by one of our Ubuntu Pro support
customers.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-24 Thread Mauricio Faria de Oliveira
Verification done on focal-proposed, following comments 23, 24, 25, 26.

Including in this comment a few key snippets from each test/comment.

---
Environment
---

LXD virtual machine

 lxc launch --vm ubuntu:focal lp2059272-focal
 lxc exec lp2059272-focal -- su - ubuntu

Enable proposed & debug symbols

cat 
for SLEEP in $(seq 0.1 0.1 2.0); do
<...>

All VMs are still managed by libvirt:

$ virsh list
 Id   Name State

 1test-vm-1running
 2test-vm-2running
 3test-vm-3running
 4test-vm-4running
 5test-vm-5running
 6test-vm-6running
 7test-vm-7running
 8test-vm-8running
 9test-vm-9running
 10   test-vm-10   running


---
Steps with test packages on Focal (shutdown-on-init)
---

Scenario 1) Shutdown wins race against XML update (ie, shutdown happens
first)

<...>

Now, let the qemuProcessReconnect thread continue, it will not update the XML 
file,
because 'quit' is set (ie, shutdown in progress)

(gdb) t 20
(gdb) p ((virNetDaemonPtr)anyobj)->quit
$2 = true

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

(gdb) c &

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

<...>

$ sudo grep 'Leaving the update of .* domain status XML' 
/var/log/libvirt/libvirtd-debug.log
2024-04-24 12:08:40.054+: 3770: info : qemuProcessReconnect:8157 : 
Leaving the update of 'test-vm' domain status XML for the next initialization 
(shutdown detected on this initialization).

<...>

$ sudo grep -e '
  
  

Scenario 2) Shutdown loses race against XML update (ie, update happens
first)

<...>

Instead, let the qemuProcessReconnect thread take the lock, and update
the XML file, but not unlock yet

<...>

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10189 Apr 24 12:02 /run/libvirt/qemu/test-vm.xml

(gdb) b virObjectUnlock thread 20 if anyobj == $ptr
(gdb) c

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10189 Apr 24 12:14 /run/libvirt/qemu/test-vm.xml

<...>

$ sudo grep -e '
  
  

Scenario 3) Shutdown happens along QEMU monitor calls (ie, calls don't
finish)

<...>

The XML was not updated, as expected:

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10189 Apr 24 12:14 /run/libvirt/qemu/test-vm.xml

$ sudo grep -e '
  
  
<...>

Now, the next time libvirtd starts, it correctly parses that XML:

 $ sudo systemctl start libvirtd.service

 $ journalctl -b -u libvirtd.service | grep -A1 error
 $
 
And libvirt is aware of the domain, and can manage it:

$ virsh list
 Id   Name  State
-
 1test-vm   running

$ virsh destroy test-vm
Domain test-vm destroyed

$ virsh undefine test-vm
Domain test-vm has been undefined

---
Steps with test packages on Focal (shutdown-on-runtime)
---

<...>
Check the formatter/options again; it is *STILL* referenced, not 0x0 anymore:

(gdb) t 20
(gdb) p xmlopt.privateData.format
$3 = (virDomainXMLPrivateDataFormatFunc) 0x7fd08c3437c0 

(gdb) p/x xmlopt.parent
$4 = {u = {dummy_align1 = 0x1cafe0026, dummy_align2 = 0x1cafe0026, s = 
{magic = 0xcafe0026, refs = 0x1}}, klass = 0x7fd080043170}

Let the save function continue, and libvirt finishes shutting down:
<...>
Check the VM status XML *after*:

$ ls -l /run/libvirt/qemu/test-vm.xml
-rw--- 1 root root 10251 Apr 24 12:28 /run/libvirt/qemu/test-vm.xml

$ sudo grep -e '
   

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-18 Thread Robie Basak
Hello Mauricio, or anyone else affected,

Accepted libvirt into focal-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/libvirt/6.0.0-0ubuntu8.20 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
focal to verification-done-focal. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-focal. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Tags removed: verification-failed-focal
** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-18 Thread Robie Basak
Mauricio identified that there are many symbols added in libvirt-daemon-
driver-qemu.so, due to the inclusion of the RPC archives, associated
with gcc/ld export-dynamic in the build (which includes all symbols, not
just actually used). On consultation with both Mauricio and Sergio, we
concluded that this is OK, and not worth the effort and risk of further
changes to suppress them.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-17 Thread Mauricio Faria de Oliveira
SRU team: this is on hold to double check the library symbols (thanks,
Robie!), please do not accept yet.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-16 Thread Mauricio Faria de Oliveira
Hi Sergio,

Thanks for taking the time to read and review the patch and test cases!

> this new code path could be reached even after the initialization,
couldn't it?

No, this code path should only be reached during initialization, AFAICT.

That is because qemuProcessReconnect() can only be reached from function
calls in the initialization path when the daemon initializes the drivers.

For reference, this is the only call path reaching qemuProcessReconnect().
(I've started with it, and gone back into callers.)

@ src/remote/remote_daemon.c
- main()
- daemonStateInit()
- daemonRunStateInit()
@ src/libvirt.c
- virStateInitialize()
- virStateDriverTab[i]->stateInitialize()
@ src/qemu/qemu_driver.c
- virStateDriver qemuStateDriver.stateInitialize = qemuStateInitialize
- qemuStateInitialize()
@ src/qemu/qemu_process.c
- qemuProcessReconnectAll()
- qemuProcessReconnectHelper()
- qemuProcessReconnect()

> Worst case scenario (i.e., if we fail to consider a code path), we will
> have a "memory leak" during shutdown, which is not the end of the world.

Right, that's comforting. :)

And just to clarify on the code path consideration (so as to provide more
reassurance for the patch, regarding a code path possibly not considered):

There should be only one code path leading to qemuProcessReconnect() (above),
and fortunately the points to inc/dec references are straightforward there:

The thread is created (1) either successfully or fails (2), and has a single
return point (3).

So, if the reference count is incremented right before thread creation (1), 
there is only 2 code sites to decrement it: on thread creation failure (2)
(since the function doesn't run, its return point "dec" doesn't run either),
and thread creation success: the function runs, so "dec" in return point (3).

> Otherwise, the patch LGTM and I'm satisfied with the testing you did.
> Feel free to go ahead and upload it.

Ok, cool. I think the clarifications above should address the two points
you brought up, so I'll continue and rebase and upload it.

Thanks again!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-15 Thread Sergio Durigan Junior
Hi Mauricio,

Thanks for the detailed explanations, as usual.

I took some time here to read your patch.  It took me down the rabbit
hole and I did some archaeology to find out what these "inhibit*"
pointers are about.  It was fun; I learned that they were introduced
back in 2012 as a means to allow drivers to signal libvirt when to
inhibit shutdown because there are still VMs around.  But I digress.

I liked your approach here and indeed, it seems simpler than the other
one (although the initial approach was also simple to grasp and elegant;
too bad it didn't work for all cases).

My only comment/question here would be this: the changes to
qemuProcessReconnect mention that they're being done in order to prevent
the XML "corruption" when there's a shutdown detected during
initialization, but (and I may be wrong here) it seems to me that this
new code path could be reached even after the initialization, couldn't
it?  Either way, this is not really important and doesn't affect the
patch (aside from the possible amends to the comments being added), and
it's OK if you don't know the answer, too.

My other source of "concern" here was the reference handling for the new
pointer, but I couldn't find any potential problems with what you did.
Worst case scenario (i.e., if we fail to consider a code path), we will
have a "memory leak" during shutdown, which is not the end of the world.
The lock/unlock dance is also simple and trivially verifiable.

Otherwise, the patch LGTM and I'm satisfied with the testing you did.
Feel free to go ahead and upload it.

Thanks again.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
Steps with test packages on Focal (shutdown-on-runtime)
---

Stop libvirtd systemd units

 sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

  sudo gdb \
-iex 'set confirm off' \
-iex 'set pagination off' \
-ex 'set non-stop on' \
-ex 'handle SIGTERM nostop noprint pass' \
-ex 'add-symbol-file /usr/sbin/libvirtd' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
-ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
-ex 'add-symbol-file 
/usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

 b qemuStateCleanup
 b processDeviceDeletedEvent
 run

Start test VM with an USB mouse device

  cat <<-EOF >test-vm.xml
  
test-vm

  hvm

32
1

  

  
EOF

 virsh define test-vm.xml
 virsh start test-vm

 $ virsh list
 Id Name State
 -
 1 test-vm running

Delete the USB mouse device

 DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | 
grep 'dev: usb-mouse' | cut -d'"' -f2)
 virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

 Thread 20 "libvirtd" hit Breakpoint 2, 0x7ffba902204e in
processDeviceDeletedEvent (devAlias=, vm=0x7ffbac00de90,
driver=0x7ffbac021380) at ../../../src/qemu/qemu_driver.c:4888

Add breakpoint to domain status XML save, and continue the thread above

 b virDomainObjSave
 t 20
 c

Thread 20 "libvirtd" hit Breakpoint 3, virDomainObjSave
(obj=0x7ffbac00de90, xmlopt=0x7ffbac044130, statusDir=0x7ffbac01f530
"/run/libvirt/qemu") at ../../../src/conf/domain_conf.c:29157

Check the backtrace of the domain status XML save function, coming from
device deleted event

 (gdb) bt
#0  virDomainObjSave (obj=0x7ffbac00de90, xmlopt=0x7ffbac044130, 
statusDir=0x7ffbac01f530 "/run/libvirt/qemu") at 
../../../src/conf/domain_conf.c:29157
#1  0x7ffba9022127 in processDeviceDeletedEvent 
(devAlias=0x556074b5e3f0 "input0", vm=0x7ffbac00de90, driver=0x7ffbac021380) at 
../../../src/qemu/qemu_driver.c:4312
#2  qemuProcessEventHandler (data=0x556074b63a10, 
opaque=0x7ffbac021380) at ../../../src/qemu/qemu_driver.c:4888
#3  0x7ffbbee8f1af in virThreadPoolWorker 
(opaque=opaque@entry=0x556074c047a0) at ../../../src/util/virthreadpool.c:163
#4  0x7ffbbee8e51c in virThreadHelper (data=) at 
../../../src/util/virthread.c:196
#5  0x7ffbbeb4f609 in start_thread (arg=) at 
pthread_create.c:477
#6  0x7ffbbea74353 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Leave the thread at this point

Let's trigger the shutdown path

 $ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at
../../../src/qemu/qemu_driver.c:1127

Check the function pointer is non-NULL _before_ cleanup

(gdb) p xmlopt.privateData.format
$1 = (virDomainXMLPrivateDataFormatFunc) 0x7ffba8f7c7c0 


(gdb) p/x xmlopt.parent
$2 = {u = {dummy_align1 = 0x1cafe0027, dummy_align2 = 0x1cafe0027, s = 
{magic = 0xcafe0027, refs = 0x1}}, klass = 0x7ffbac044100}

Let cleanup run:

t 1
c &

Check the formatter/options again; it is *STILL* referenced, not 0x0
anymore:

(gdb) p xmlopt.privateData.format
$3 = (virDomainXMLPrivateDataFormatFunc) 0x7ffba8f7c7c0 


(gdb) p/x xmlopt.parent
$4 = {u = {dummy_align1 = 0x1cafe0027, dummy_align2 = 0x1cafe0027, s = 
{magic = 0xcafe0027, refs = 0x1}}, klass = 0x7ffbac044100}

Check the shutdown/cleanup thread is waiting for it,
in the path to free the worker thread pool:

(gdb) i th 1
  Id   Target Id   Frame
  1Thread 0x7ffbbb035b40 (LWP 5887) "libvirtd" (running)
(gdb) t 1
(gdb) interrupt
(gdb) bt
#0  futex_wait_cancelable (private=, expected=0, 
futex_word=0x7ffbac05fd60) at ../sysdeps/nptl/futex-internal.h:183
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, 
mutex=0x7ffbac05fce0, cond=0x7ffbac05fd38) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x7ffbac05fd38, mutex=0x7ffbac05fce0) at 
pthread_cond_wait.c:647
#3  0x7ffbbee8e79b in virCondWait (c=, m=) at ../../../src/util/virthread.c:144
#4  0x7ffbbee8f438 in virThreadPoolFree (pool=) at 
../../../src/util/virthreadpool.c:286
#5  0x7ffba8fed5d1 in qemuStateCleanup () at 
../../../src/qemu/qemu_driver.c:1131
#6  0x7ffbbf02c47f in virStateCleanup () at 
../../../src/libvirt.c:669
#7  0x556072acebc8 in main (argc=, argv=) at 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
Steps with test packages on Focal (shutdown-on-init)
---

Start test VM

 cat <<-EOF >test-vm.xml
 
   test-vm
   
 hvm
   
   32
   1
 
EOF

 virsh define test-vm.xml
 virsh start test-vm

 $ virsh list
  Id Name State
 -
  1 test-vm running

Stop libvirtd systemd units

 sudo systemctl stop 'libvirtd*'


Scenario 1) Shutdown wins race against XML update (ie, shutdown happens first)

Start libvirtd in GDB

sudo gdb \
   -iex 'set confirm off' \
   -iex 'set pagination off' \
   -ex 'set non-stop on' \
   -ex 'handle SIGTERM nostop noprint pass' \
   -ex 'add-symbol-file /usr/sbin/libvirtd' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
   -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
   -ex 'add-symbol-file 
/usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
/usr/sbin/libvirtd

Stop on initialization

(gdb) b qemuStateInitialize
(gdb) run

Thread 17 "libvirtd" hit Breakpoint 1, qemuStateInitialize
(privileged=true, callback=0x5558939f10c0 ,
opaque=0x555893b905d0) at ../../../src/qemu/qemu_driver.c:644

Save the daemon 'opaque' pointer in $ptr (global variable
qemu_driver_dmn is not accessible):

(gdb) p qemu_driver_dmn
Cannot access memory at address 0x1e39a8

(gdb) p 'src/qemu/qemu_driver.c'::qemu_driver_dmn
Cannot access memory at address 0x1e39a8

(gdb) t 17
(gdb) set $ptr = opaque

Run until qemuProcessReconnect

(gdb) b qemuProcessReconnect
(gdb) c

Thread 20 "libvirtd" hit Breakpoint 2, qemuProcessReconnect
(opaque=0x7fd82c054900) at ../../../src/qemu/qemu_process.c:7922

Run this thread until the lock on qemu_driver_dmn:

(gdb) b virObjectLock thread 20 if anyobj == $ptr
(gdb) t 20
(gdb) c

Thread 20 "libvirtd" hit Breakpoint 3, virObjectLock
(anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427

See the daemon is not yet shutting down

(gdb) t 20
(gdb) p ((virNetDaemonPtr)anyobj)->quit
$1 = false

Stop the shutdown path in the main thread on the lock on qemu_driver_dmn

(gdb) b virObjectLock thread 1 if anyobj == $ptr

$ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock
(anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427

(gdb) t 1
#0  virObjectLock (anyobj=0x555893b905d0) at 
../../../src/util/virobject.c:427
#1  0x7fd83eabc2d5 in virNetDaemonSignalEvent (watch=watch@entry=2, 
fd=, events=events@entry=1, opaque=opaque@entry=0x555893b905d0) 
at ../../../src/rpc/virnetdaemon.c:630
#2  0x7fd83e97da0d in virEventPollDispatchHandles 
(fds=0x555893bc21c0, nfds=) at 
../../../src/util/vireventpoll.c:503
#3  virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#4  0x7fd83e97c095 in virEventRunDefaultImpl () at 
../../../src/util/virevent.c:353
#5  0x7fd83eabd495 in virNetDaemonRun (dmn=0x555893b905d0) at 
../../../src/rpc/virnetdaemon.c:836
#6  0x5558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430

Let it deliver the signal

(gdb) c
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock 
(anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427

(gdb) bt
#0  virObjectLock (anyobj=0x555893b905d0) at 
../../../src/util/virobject.c:427
#1  0x7fd83eabd2ed in virNetDaemonQuit (dmn=0x555893b905d0) at 
../../../src/rpc/virnetdaemon.c:854
#2  0x7fd83eabc33e in virNetDaemonSignalEvent (watch=watch@entry=2, 
fd=, events=events@entry=1, opaque=opaque@entry=0x555893b905d0) 
at ../../../src/rpc/virnetdaemon.c:645
#3  0x7fd83e97da0d in virEventPollDispatchHandles 
(fds=0x555893bc21c0, nfds=) at 
../../../src/util/vireventpoll.c:503
#4  virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#5  0x7fd83e97c095 in virEventRunDefaultImpl () at 
../../../src/util/virevent.c:353
#6  0x7fd83eabd495 in virNetDaemonRun (dmn=0x555893b905d0) at 
../../../src/rpc/virnetdaemon.c:836
#7  0x5558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430

Let it set 'quit'

(gdb) c
Thread 1 "libvirtd" hit Breakpoint 4, virObjectLock 
(anyobj=0x555893b905d0) at ../../../src/util/virobject.c:427

(gdb) bt
#0  virObjectLock (anyobj=0x555893b905d0) at 
../../../src/util/virobject.c:427
#1  0x7fd83eabd4a5 in virNetDaemonRun (dmn=0x555893b905d0) at 
../../../src/rpc/virnetdaemon.c:841
#2  0x5558939ef7d1 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1430

Let it take the lock in the event loop

(gdb) 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
Steps with test packages on Focal (normal restarts)
---

Restart libvirt 100 times with 10 QEMU domains.

All domains continued to be managed by libvirt.

Create 10 test VMs (test-vm-1, test-vm-2, ..., test-vm-10):

for NAME in test-vm-{1..10}; do cat <<-EOF >test-vms.xml && virsh 
define test-vms.xml && virsh start $NAME; done

  ${NAME}
  
hvm
  
  32
  1

EOF

Disable the systemd unit rate limiting for (re)starts:

sudo mkdir -p /etc/systemd/system/libvirtd.service.d/
cat <&1 | tee /tmp/libvirtd-restart.log

Reset libvirtd debug log
Sleep 0.1, Restart 1
...
Sleep 0.1, Restart 100
Check libvirtd debug log

Reset libvirtd debug log
Sleep 0.2, Restart 1
...
Sleep 0.2, Restart 100
Check libvirtd debug log

...

Reset libvirtd debug log
Sleep 2.0, Restart 1

Sleep 2.0, Restart 100
Check libvirtd debug log


Checking that libvirtd is started 1+100 times for each restart interval:

$ sudo grep -c 'libvirt version' /tmp/libvirtd-debug.log.SLEEP-*
/tmp/libvirtd-debug.log.SLEEP-0.1:101
/tmp/libvirtd-debug.log.SLEEP-0.2:101
/tmp/libvirtd-debug.log.SLEEP-0.3:101
/tmp/libvirtd-debug.log.SLEEP-0.4:101
/tmp/libvirtd-debug.log.SLEEP-0.5:101
/tmp/libvirtd-debug.log.SLEEP-0.6:101
/tmp/libvirtd-debug.log.SLEEP-0.7:101
/tmp/libvirtd-debug.log.SLEEP-0.8:101
/tmp/libvirtd-debug.log.SLEEP-0.9:101
/tmp/libvirtd-debug.log.SLEEP-1.0:101
/tmp/libvirtd-debug.log.SLEEP-1.1:101
/tmp/libvirtd-debug.log.SLEEP-1.2:101
/tmp/libvirtd-debug.log.SLEEP-1.3:101
/tmp/libvirtd-debug.log.SLEEP-1.4:101
/tmp/libvirtd-debug.log.SLEEP-1.5:101
/tmp/libvirtd-debug.log.SLEEP-1.6:101
/tmp/libvirtd-debug.log.SLEEP-1.7:101
/tmp/libvirtd-debug.log.SLEEP-1.8:101
/tmp/libvirtd-debug.log.SLEEP-1.9:101
/tmp/libvirtd-debug.log.SLEEP-2.0:101

All VMs are still managed by libvirt:

$ virsh list
 Id   Name State

 2test-vm-1running
 3test-vm-2running
 4test-vm-3running
 5test-vm-4running
 6test-vm-5running
 7test-vm-6running
 8test-vm-7running
 9test-vm-8running
 10   test-vm-9running
 11   test-vm-10   running

Remove test VMs:

for NAME in test-vm-{1..10}; do virsh destroy $NAME && virsh
undefine $NAME; done

Note that the race condition for the shutdown-on-init condition
is so tight, that it has not happened once in 2020 restarts
(the new fix logs it). It really needs a synthetic reproducer.

$ sudo grep 'Leaving' /tmp/libvirtd-debug.log.SLEEP-*
$

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
Test packages with the new fix in ppa:mfo/lp2059272
built correctly in all supported architectures.

The next 3 comments show steps to reproduce/verify
the issue/fix with the test packages -- all passed.

Environment
---

LXD virtual machine

lxc launch --vm ubuntu:focal lp2059272-focal
lxc exec lp2059272-focal -- su - ubuntu

Enable PPA & debug symbols

sudo add-apt-repository -yn ppa:mfo/lp2059272
sudo sed '/^deb / s,$, main/debug,' -i 
/etc/apt/sources.list.d/mfo-ubuntu-lp2059272-focal.list
sudo apt update

Install packages

sudo apt install --yes 
libvirt{0,-daemon{,-driver-qemu,-system}}{,-dbgsym} 
libvirt-daemon-system-systemd libvirt-clients gdb qemu-system-x86
newgrp libvirt # or logout/login

$ apt-cache policy libvirt-daemon
libvirt-daemon:
  Installed: 6.0.0-0ubuntu8.19
  Candidate: 6.0.0-0ubuntu8.19
  Version table:
 *** 6.0.0-0ubuntu8.19 500
500 http://ppa.launchpad.net/mfo/lp2059272/ubuntu focal/main 
amd64 Packages
100 /var/lib/dpkg/status
 6.0.0-0ubuntu8.16 500
500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 
Packages
500 http://security.ubuntu.com/ubuntu focal-security/main amd64 
Packages
 6.0.0-0ubuntu8 500
500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages

Libvirtd debug logging

cat <<-EOF | sudo tee -a /etc/libvirt/libvirtd.conf
log_filters="1:qemu 1:libvirt"
log_outputs="3:syslog:libvirtd 
1:file:/var/log/libvirt/libvirtd-debug.log"
EOF

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
The new fix for shutdown-on-init on Focal is simpler:
Skip the XML update on init if libvirt is shutting down.

This is based on 2 points:

1) The XML update on initialization will not be used at all
   in this run of libvirtd, since libvirtd is shutting down.

2) The XML update in _this_ initialization will be overwritten
by the XML update in the _next_ initialization anyway!

Hence, it is OK to skip this XML update if shutting down.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-12 Thread Mauricio Faria de Oliveira
** Description changed:

  [ Impact ]
  
   * If a race condition occurs on libvirtd shutdown,
     a QEMU domain status XML (/run/libvirt/qemu/*.xml)
     might lose the QEMU-driver specific information,
     such as ''.
     (The race condition details are in [Other Info].)
  
   * On the next libvirtd startup, the parsing of that
     QEMU domain's status XML fails as 'https://launchpad.net/~mfo/+archive/ubuntu/lp2059272
  
  [ Regression Potential ]
  
   * One patch changes *where* in the libvirt qemu driver's
     shutdown path the worker thread pool is stopped/freed:
     from _after_ releasing other data to _before_ doing so.
  
-  * The other patch (Focal-only) introduces a bounded wait
-    (with configurable timeout via an environment variable)
-    in the (same) libvirt qemu driver's shutdown path.
- 
-    By default, this waits for qemuProcessReconnect threads
-    for up to 30 seconds (expected to finish in less than
-    1 second, in practice), and gives up / continues with
-    shutdown anyway so not to introduce a behavior change
-    on this path (prevents impact in case of regressions).
+  * The other patch (Focal-only) skips the update of the
+QEMU domain status XML file during initialization if
+    libvirt is shutting down. (This is OK since the file
+is not going to be used anyway in the current run as
+it is shutting down, and it will be updated again in
+the next run anyway.)
  
   * Therefore, the potential for regression is limited to
     the libvirt qemu driver's shutdown path, and would be
     observed when stopping/restarting libvirtd.service.
  
   * The behavior during normal operation is not affected.
  
  [Other Info]
  
   * In Focal, race windows exist if libvirtd shuts down
     _after_ initialization and _during_ initialization
     (which is unlikely in practice, but it's possible.)
  
     Say, 'shutdown'on-runtime' and 'shutdown-on-init'.
  
   * In Jammy, only 'shutdown-on-runtime' might happen,
     due to the introduction of the '.stateShutdownWait'
     driver callback (not available in Focal), which
     indirectly prevents the 'shutdown-on-init' race
     due to additional synchronization with locking.
  
   * For 'shutdown-on-runtime': use upstream commit [1].
     It's needed in Focal and Jammy (included in Mantic).
  
   * For 'shutdown-on-init' (Focal-only), we should use a
-    downstream-only patch (with configurable behavior),
+    downstream-only patch (with conservative behavior),
     since upstream addressed this issue indirectly with
     the '.stateShutdownWait' callbacks and other changes
     (which are not SRU material, ~10 patches, redesign [2])
-in 6.8.0.
+    in 6.8.0.
  
  [1]
  
https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842
  
   $ git describe --contains 152770333449cd3b78b4f5a9f1148fc1f482d842
   v9.3.0-rc1~90
  
   $ rmadison -a source libvirt | sed -n '/focal/,$p'
    libvirt | 6.0.0-0ubuntu8   | focal   | source
    libvirt | 6.0.0-0ubuntu8.16| focal-security  | source
    libvirt | 6.0.0-0ubuntu8.16| focal-updates   | source
    libvirt | 6.0.0-0ubuntu8.17| focal-proposed  | source
    libvirt | 8.0.0-1ubuntu7   | jammy   | source
    libvirt | 8.0.0-1ubuntu7.5 | jammy-security  | source
    libvirt | 8.0.0-1ubuntu7.8 | jammy-updates   | source
    libvirt | 9.6.0-1ubuntu1   | mantic  | source
    libvirt | 10.0.0-2ubuntu1  | noble   | source
    libvirt | 10.0.0-2ubuntu5  | noble-proposed  | source
  
  [2] https://listman.redhat.com/archives/libvir-list/2020-July/205291.html
  [PATCH 00/10] resolve hangs/crashes on libvirtd shutdown
  
  commit 94e45d1042e21e03a15ce993f90fbef626f1ae41
  Author: Nikolay Shirokovskiy 
  Date: Thu Jul 23 09:53:04 2020 +0300
  
  rpc: finish all threads before exiting main loop
  
  $ git describe --contains 94e45d1042e21e03a15ce993f90fbef626f1ae41
  v6.8.0-rc1~279
  
  [Original Description]
  
  There's a race condition on libvirtd shutdown
  that might cause the domain status XML file(s)
  to lose the ' tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name  State
   -
    1test-vm   running
  
   $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   
  
   $ virsh list --all
    Id   Name  State
   --
    -test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-09 Thread Launchpad Bug Tracker
This bug was fixed in the package libvirt - 8.0.0-1ubuntu7.9

---
libvirt (8.0.0-1ubuntu7.9) jammy; urgency=medium

  * d/p/u/lp2059272-qemu-Fix-potential-crash-during-driver-cleanup.patch:
On QEMU driver cleanup, release (stop) the worker thread pool _first_,
before other data used by possibly running worker threads (LP: #2059272)

 -- Mauricio Faria de Oliveira   Wed, 27 Mar 2024
12:47:46 -0300

** Changed in: libvirt (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-09 Thread Brian Murray
** Tags removed: verification-needed-focal
** Tags added: verification-failed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-03 Thread Mauricio Faria de Oliveira
(Un)fortunately, during focal-proposed testing today, a corner case was 
identified (late),
which will require changes.

Some of the qemuProcessReconnect() threads finished within the timeout
but others didn't.

The ones which didn't were waiting on a reply back from the QEMU monitor, but 
that never
fires because it is processed/delivered by the main event loop, which is 
already stopped
at libvirtd level by the time the driver-level qemuStateCleanup() runs (where 
we wait).

The libvirt shutdown does happen after the timeout, but this is slow and
not as planned.

I've been considering design options to address it and will submit an
incremental upload.

Apologies for the inconvenience.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-02 Thread Chris Halse Rogers
Hello Mauricio, or anyone else affected,

Accepted libvirt into focal-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/libvirt/6.0.0-0ubuntu8.18 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
focal to verification-done-focal. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-focal. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: libvirt (Ubuntu Focal)
   Status: In Progress => Fix Committed

** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-01 Thread Mauricio Faria de Oliveira
Thanks for the reviews, Sergio!

That certainly helps building additional confidence in the patch.

With that and tests covering all options (comments #13 and #15)
showing good/expected results, I'm happy to upload it to Focal.

** Changed in: libvirt (Ubuntu Focal)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-04-01 Thread Sergio Durigan Junior
Mauricio asked me to review the debdiff for Focal as well.

This debdiff is different because he had to implement some code to
synchronize the destruction of the worker threads with the free'ing of
the thread pool.

I looked at the new code, analyzed it as best as I could, asked a few
questions to Mauricio regarding the new environment variable being
created, and finally got convinced that everything seems OK.  The
implementation is sound and the concept is simple: qemuStateCleanupWait
acts similarly to a thread barrier and makes sure that all threads
previously created by qemuProcessReconnect are able to finish before
qemuStateCleanup can proceed.

In a nutshell: LGTM, +1.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
Steps with test packages on Focal (shutdown-on-init)
---

Environment:
---

On top of LXD VM in comments #12/#13.

Enable PPA & debug symbols

sudo add-apt-repository -yn ppa:mfo/lp2059272
sudo sed '/^deb / s,$, main/debug,' -i 
/etc/apt/sources.list.d/mfo-ubuntu-lp2059272-focal.list
sudo apt update

Install packages

sudo apt install --yes libvirt{0,-daemon{,-driver-
qemu}}{,-dbgsym} libvirt-clients gdb qemu-system-x86

$ dpkg -s libvirt-daemon | grep ^Version:
Version: 6.0.0-0ubuntu8.18~ppa1

Libvirtd debug logging

cat <) 
at ../../../src/qemu/qemu_process.c:8123
#2  0x7fe64aebd54a in virThreadHelper (data=) at 
../../../src/util/virthread.c:196
#3  0x7fe64ab7e609 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#4  0x7fe64aaa3353 in clone () from /lib/x86_64-linux-gnu/libc.so.6

$ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at
../../../src/qemu/qemu_driver.c:1180

t 20

(gdb) p xmlopt.privateData.format
$1 = (virDomainXMLPrivateDataFormatFunc) 0x7fe644152890 


Let the cleanup function finish

t 1
finish

Notice it took a while (30 seconds).

(gdb) t 20
(gdb) p xmlopt.privateData.format
$3 = (virDomainXMLPrivateDataFormatFunc) 0x0

Let the save function continue, and libvirt finish shutdown:

(gdb) c &
(gdb) t 1
(gdb) c
(gdb) q

Check the VM status XML *after*:

ubuntu@lp2059272-focal:~$ sudo grep -e '
  

And everything happened as in the reproducer.
i.e., the SAME behavior happened BY DEFAULT.
Just with a 30 seconds delay.

Checking the libvirtd debug logs to confirm the patch behavior:

$ sudo tail -n50 /var/log/libvirt/libvirtd-debug.log | sed -n 
'/qemuStateCleanupWait/,$p'
2024-03-30 22:49:24.737+: 6875: debug : qemuStateCleanupWait:1144 : 
timeout 30, timeout_env '(null)'
2024-03-30 22:49:24.737+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 0
2024-03-30 22:49:24.737+: 6875: warning : qemuStateCleanupWait:1153 
: Waiting for qemuProcessReconnect() threads (1) to end. Configure with 
LIBVIRT_QEMU_STATE_CLEANUP_WAIT_TIMEOUT (-1 = wait; 0 = do not wait; N = wait 
up to N seconds; current = 30)
2024-03-30 22:49:25.740+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 1
2024-03-30 22:49:26.740+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 2
2024-03-30 22:49:27.740+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 3
2024-03-30 22:49:28.741+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 4
2024-03-30 22:49:29.741+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 5
2024-03-30 22:49:30.741+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 6
2024-03-30 22:49:31.742+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 7
2024-03-30 22:49:32.742+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 8
2024-03-30 22:49:33.742+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 9
2024-03-30 22:49:34.742+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 10
2024-03-30 22:49:35.743+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 11
2024-03-30 22:49:36.743+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 12
2024-03-30 22:49:37.744+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 13
2024-03-30 22:49:38.744+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 14
2024-03-30 22:49:39.744+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 15
2024-03-30 22:49:40.744+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 16
2024-03-30 22:49:41.745+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 17
2024-03-30 22:49:42.745+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 18
2024-03-30 22:49:43.746+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 19
2024-03-30 22:49:44.746+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 20
2024-03-30 22:49:45.747+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 21
2024-03-30 22:49:46.747+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 22
2024-03-30 22:49:47.748+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 23
2024-03-30 22:49:48.748+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 24
2024-03-30 22:49:49.749+: 6875: debug : qemuStateCleanupWait:1150 : 
threads 1, seconds 25
2024-03-30 22:49:50.749+: 6875: debug 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
Steps with test packages on Focal (shutdown-on-runtime)
---

Environment:
---

On top of LXD VM in comments #12/#13.

Enable PPA & debug symbols

sudo add-apt-repository -yn ppa:mfo/lp2059272
sudo sed '/^deb / s,$, main/debug,' -i 
/etc/apt/sources.list.d/mfo-ubuntu-lp2059272-focal.list
sudo apt update

Install packages

sudo apt install --yes libvirt{0,-daemon{,-driver-
qemu}}{,-dbgsym} libvirt-clients gdb qemu-system-x86

$ dpkg -s libvirt-daemon | grep ^Version:
Version: 6.0.0-0ubuntu8.18~ppa1

Libvirtd debug logging

cat <) at 
../../../src/util/virthread.c:196
#5  0x7fb333b95609 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#6  0x7fb333aba353 in clone () from /lib/x86_64-linux-gnu/libc.so.6

$ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at
../../../src/qemu/qemu_driver.c:1180

t 20

(gdb) p xmlopt.privateData.format
$1 = (virDomainXMLPrivateDataFormatFunc) 0x7fb32c167890 


t 1
c &

Check the formatter/options again; it is *STILL* referenced, not 0x0
anymore:

t 20

(gdb) p xmlopt.privateData.format
$2 = (virDomainXMLPrivateDataFormatFunc) 0x7fb32c167890 


Check the shutdown/cleanup thread is waiting for it,
in the path to free the worker thread pool:

(gdb) i th 1
  Id   Target Id   Frame
  1Thread 0x7fb33007bb40 (LWP 6585) "libvirtd" (running)

t 1
interrupt

(gdb) bt
#0  0x7fb333b9c376 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#1  0x7fb333ed479b in virCondWait (c=, m=) at ../../../src/util/virthread.c:144
#2  0x7fb333ed5438 in virThreadPoolFree (pool=) at 
../../../src/util/virthreadpool.c:286
#3  0x7fb32c1d89e3 in qemuStateCleanup () at 
../../../src/qemu/qemu_driver.c:1186
#4  0x7fb33407246f in virStateCleanup () at 
../../../src/libvirt.c:669
#5  0x564ae98babc8 in main (argc=, argv=) at ../../../src/remote/remote_daemon.c:1447

Let the save function continue, and libvirt finishes shutting down:

(gdb) c &
Continuing.
(gdb) t 20
(gdb) c
[Inferior 1 (process 6585) exited normally]
(gdb) q

Check the VM status XML *after*:

$ sudo grep -e '
  
  

It *still* has the 'monitor path' tag/field.

Now, the next time libvirtd starts, it correctly parses that XML:

$ sudo systemctl start libvirtd.service

$ journalctl -b -u libvirtd.service | grep -A1 error
Mar 30 22:27:20 lp2059272-focal libvirtd[6670]: 6686: error : 
dnsmasqCapsRefreshInternal:714 : Cannot check dnsmasq binary /usr/sbin/dnsmasq: 
No such file or directory

And libvirt is aware of the domain, and can manage it:

$ virsh list
 Id   Name  State
-
 1test-vm   running

$ virsh destroy test-vm
Domain test-vm destroyed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
Steps to reproduce on Focal (shutdown-on-init)
---

LXD virtual machine

lxc exec lp2059272-focal -- su - ubuntu
lxc exec lp2059272-focal -- su - ubuntu

Latest Packages and Debug Symbols:

cat 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
Steps to reproduce on Focal (shutdown-on-runtime)
---

LXD virtual machine

lxc exec lp2059272-focal -- su - ubuntu
lxc exec lp2059272-focal -- su - ubuntu

Latest Packages and Debug Symbols:

cat 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
Verification done on jammy-proposed.
---

Part 1: comment #2, libvirt starts without errors, and can list and manage the 
domain.
Part 2: comment #5, libvirt restarts without errors 100 times with 10 domains.

Environment:
---

LXD container

lxc launch --vm ubuntu:jammy lp2059272-jammy
lxc exec lp2059272-jammy -- su - ubuntu

Enable -proposed

sudo add-apt-repository -yp proposed

cat 

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-30 Thread Mauricio Faria de Oliveira
** Description changed:

  [ Impact ]
  
   * If a race condition occurs on libvirtd shutdown,
     a QEMU domain status XML (/run/libvirt/qemu/*.xml)
     might lose the QEMU-driver specific information,
     such as ''.
  
   * On the next libvirtd startup, the parsing of that
     QEMU domain's status XML fails as 'https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842
  
+  * Test packages built successfully in all architectures
+with -proposed enabled in Launchpad PPA mfo/lp2059272 [2]
+ 
+ [2] https://launchpad.net/~mfo/+archive/ubuntu/lp2059272
+ 
  [Original Description]
  
  There's a race condition on libvirtd shutdown
  that might cause the domain status XML file(s)
  to lose the ' tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name  State
   -
    1test-vm   running
  
   $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   
  
   $ virsh list --all
    Id   Name  State
   --
    -test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-28 Thread Mauricio Faria de Oliveira
> Please include the "normal" execution from comment #5 in the test
plan, besides the synthetic one.

Done; thanks!

** Description changed:

  [ Impact ]
  
   * If a race condition occurs on libvirtd shutdown,
     a QEMU domain status XML (/run/libvirt/qemu/*.xml)
     might lose the QEMU-driver specific information,
     such as ''.
  
   * On the next libvirtd startup, the parsing of that
     QEMU domain's status XML fails as 'https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842
- 
  
  [Original Description]
  
  There's a race condition on libvirtd shutdown
  that might cause the domain status XML file(s)
  to lose the ' tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name  State
   -
    1test-vm   running
  
   $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   
  
   $ virsh list --all
    Id   Name  State
   --
    -test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-28 Thread Andreas Hasenack
Please include the "normal" execution from comment #5 in the test plan,
besides the synthetic one.

** Changed in: libvirt (Ubuntu Jammy)
   Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Mauricio Faria de Oliveira
Uploaded to Jammy.

** Description changed:

  [ Impact ]
  
-  * If a race condition occurs on libvirtd shutdown,
-a QEMU domain status XML (/run/libvirt/qemu/*.xml)
-might lose the QEMU-driver specific information,
-such as ''.
-
-  * On the next libvirtd startup, the parsing of that
-QEMU domain's status XML fails as ''.
  
-$ virsh list
- Id Name State
-
-
-  * The domain is still running, but libvirt considers
-it as shutdown, which might cause conflicts/issues
-with higher-level tools (e.g., openstack nova).
-
-$ virsh list --all
- Id Name State
---
- - test-vm shut off
+  * On the next libvirtd startup, the parsing of that
+    QEMU domain's status XML fails as 'https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842
+ 
  
  [Original Description]
  
  There's a race condition on libvirtd shutdown
  that might cause the domain status XML file(s)
  to lose the ' tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name  State
   -
    1test-vm   running
  
   $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   
  
   $ virsh list --all
    Id   Name  State
   --
    -test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

** Changed in: libvirt (Ubuntu Jammy)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Sergio Durigan Junior
OK, I tested the fixed package from your PPA and verified that it indeed
solves the issue.

Just to reiterate, then: LGTM, +1!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Mauricio Faria de Oliveira
Test case for normal operations in this code path:

Restart libvirt 100 times with 10 QEMU domains.

All domains continued to be managed by libvirt.
No error messages observed.

$ for NAME in test-vm-{1..10}; do cat <<-EOF >test-vms.xml && virsh define 
test-vms.xml && virsh start $NAME; done

  ${NAME}  
  
hvm
  
  32
  1

EOF

Domain 'test-vm-1' defined from test-vms.xml
Domain 'test-vm-1' started
...
Domain 'test-vm-10' defined from test-vms.xml
Domain 'test-vm-10' started

$ virsh list
 Id   Name State

 6test-vm-1running
 7test-vm-2running
 8test-vm-3running
 9test-vm-4running
 10   test-vm-5running
 11   test-vm-6running
 12   test-vm-7running
 13   test-vm-8running
 14   test-vm-9running
 15   test-vm-10   running

$ for i in {1..100}; do echo restart $i; sudo systemctl restart 
libvirtd.service; sleep 10; done
restart 1
...
restart 100

$ virsh list
 Id   Name State

 6test-vm-1running
 7test-vm-2running
 8test-vm-3running
 9test-vm-4running
 10   test-vm-5running
 11   test-vm-6running
 12   test-vm-7running
 13   test-vm-8running
 14   test-vm-9running
 15   test-vm-10   running

$ journalctl -b -u libvirtd.service | grep -v -e 'systemd' -e 'hostname:' | cut 
-d' ' -f7- | sort | uniq -c | sort -rn
108 check dnsmasq binary /usr/sbin/dnsmasq: No such file or directory
106 version: 8.0.0, package: 1ubuntu7.9 (Mauricio Faria de Oliveira 
 Wed, 27 Mar 2024 12:47:46 -0300)
  1 version: 8.0.0, package: 1ubuntu7.8 (Lena Voytek 
 Wed, 29 Nov 2023 14:52:52 -0700)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Sergio Durigan Junior
Hah, I didn't see your comment before I posted mine.  Mid air conflict!

Anyway, thanks for providing the PPA.  I'll take it for a spin :-).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Sergio Durigan Junior
Mauricio,

Wow!  Such an awesome reproducer.  Very detailed and easy to follow.
Thanks for providing it.

I was able to verify the problem here, and looked at the upstream
patch[1] that fixes it.  The rationale makes sense to me, although that
cleanup function is pretty involved and it's hard to say if there can be
any fallout from moving the worker thread pool freeing action earlier.

I looked at the upstream repository and could not find any amends/fixes
to the commit in question.  It's present in Mantic and Noble, which is a
good sign.

Do you have a PPA build that I can try with the reproducer to check the
fix, please?

Otherwise, this LGTM and I'm +1 on proceeding with the SRU.  Thanks!

[1]: For reference:
https://gitlab.com/libvirt/libvirt/-/commit/152770333449cd3b78b4f5a9f1148fc1f482d842

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Mauricio Faria de Oliveira
Steps with test packages on Jammy
---

Test packages built successfully in all architectures
with -proposed enabled in Launchpad PPA mfo/lp2059272.

https://launchpad.net/~mfo/+archive/ubuntu/lp2059272

Upgrade the libvirt packages and install debug symbols

$ sudo add-apt-repository -y -c 'main' -c 'main/debug' ppa:mfo/lp2059272
$ sudo apt install --yes libvirt{0,-daemon{,-driver-qemu}}{,-dbgsym}

$ dpkg -l | grep libvirt
ii  libvirt-clients  8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon   8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-config-network8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-config-nwfilter   8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-dbgsym8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-driver-qemu   8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-driver-qemu-dbgsym8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-system8.0.0-1ubuntu7.9 ...
ii  libvirt-daemon-system-systemd8.0.0-1ubuntu7.9 ...
ii  libvirt0:amd64   8.0.0-1ubuntu7.9 ...
ii  libvirt0-dbgsym:amd648.0.0-1ubuntu7.9 ...

...

Repeat the 'Steps to reproduce' in comment #1, until this point,
and notice the differences from here.

...

Check there are 2 threads: cleanup and domain status XML save

(gdb) i th
  Id   Target Id Frame
  1Thread 0x7f1e79642ac0 (LWP 4404) "libvirtd"   qemuStateCleanup 
() at ../../src/qemu/qemu_driver.c:1070
  18   Thread 0x7f1e507f8640 (LWP 4424) "gmain"  (running)
  19   Thread 0x7f1e4fff7640 (LWP 4425) "gdbus"  (running)
  20   Thread 0x7f1e4f7f6640 (LWP 4426) "udev-event" (running)
  26   Thread 0x7f1e50ff9640 (LWP 4496) "vm-test-vm" (running)
  27   Thread 0x7f1e4e7f4640 (LWP 4506) "qemu-event" virDomainObjSave 
(obj=0x7f1e6c074040, xmlopt=0x7f1e6c028010, statusDir=0x7f1e6c03b3d0 
"/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Confirm the qemu driver's domain xml formatter/options is
set/referenced:

t 27

(gdb) p xmlopt.privateData.format
$1 = (virDomainXMLPrivateDataFormatFunc) 0x7f1e7054ada0 


(gdb) p xmlopt.parent.parent_instance
$2 = {g_type_instance = {g_class = 0x7f1e6c052000}, ref_count = 1, 
qdata = 0x0}

Let the cleanup function and shutdown path finish

t 1
c &

Check the formatter/options again; it is *STILL* referenced:

(gdb) p xmlopt.privateData.format
$3 = (virDomainXMLPrivateDataFormatFunc) 0x7f1e7054ada0 


(gdb) p xmlopt.parent.parent_instance
$4 = {g_type_instance = {g_class = 0x7f1e6c052000}, ref_count = 1, 
qdata = 0x0}

So, we keep `xmlopt.privateData.format` as it is
(and NOT set it to `0` as in Steps to Reproduce).

Check the VM status XML *before* the save function finishes:

$ sudo grep -e '
  
  

Let the save function continue, and libvirt finishes shutting down:

(gdb) c
Continuing.
...
[Inferior 1 (process 4404) exited normally]

Check the VM status XML *after*:

$ sudo grep -e '
  
  

It *CONTINUES* to have the 'monitor path' tag/field.

Now, the next time libvirtd starts, it *CORRECTLY* parses that XML:

$ sudo systemctl start libvirtd.service

$ journalctl -b -u libvirtd.service | tail
< no errors >

And libvirt is now aware of the domain, and can manage it:

$ virsh list
 Id   Name  State
-
 4test-vm   running

$ virsh destroy test-vm
Domain 'test-vm' destroyed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Mauricio Faria de Oliveira
** Description changed:

+ [ Impact ]
+ 
+  * If a race condition occurs on libvirtd shutdown,
+a QEMU domain status XML (/run/libvirt/qemu/*.xml)
+might lose the QEMU-driver specific information,
+such as ''.
+
+  * On the next libvirtd startup, the parsing of that
+QEMU domain's status XML fails as ' tag/field.
  
  This causes an error on libvirtd startup, and
  the domain is not listed/managed, despite it
  is still running.
  
   $ virsh list
    Id   Name  State
   -
    1test-vm   running
  
-  $ sudo systemctl restart libvirtd.service
+  $ sudo systemctl restart libvirtd.service
  
   $ journalctl -b -u libvirtd.service | tail
   ...
   ... libvirtd[2789]: internal error: no monitor path
   ... libvirtd[2789]: Failed to load config for domain 'test-vm'
  
   $ virsh list
    Id   Name   State
   
  
   $ virsh list --all
    Id   Name  State
   --
    -test-vm   shut off
  
   $ pgrep -af qemu-system-x86_64 | cut -d, -f1
   2638 /usr/bin/qemu-system-x86_64 -name guest=test-vm,

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2059272

Title:
  libvirt domain is not listed/managed after libvirt restart with
  messages "internal error: no monitor path" and "Failed to load config
  for domain"

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2059272/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2059272] Re: libvirt domain is not listed/managed after libvirt restart with messages "internal error: no monitor path" and "Failed to load config for domain"

2024-03-27 Thread Mauricio Faria de Oliveira
Steps to reproduce on Jammy
---

Stop libvirt systemd units

sudo systemctl stop 'libvirtd*'

Start libvirt in GDB

sudo gdb \
  -iex 'set confirm off' \
  -iex 'set pagination off' \
  -iex 'set debuginfod enabled on' \
  -iex 'set debuginfod urls https://debuginfod.ubuntu.com' \
  -ex 'set non-stop on' \
  -ex 'handle SIGTERM nostop noprint pass' \
  -ex 'add-symbol-file /usr/sbin/libvirtd' \
  -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt.so.0' \
  -ex 'add-symbol-file /usr/lib/x86_64-linux-gnu/libvirt-qemu.so.0' \
  -ex 'add-symbol-file 
/usr/lib/x86_64-linux-gnu/libvirt/connection-driver/libvirt_driver_qemu.so' \
  /usr/sbin/libvirtd

Add breakpoints for qemu driver cleanup and device deleted event

b qemuStateCleanup
b processDeviceDeletedEvent
run

Start test VM with an USB mouse device

cat <<-EOF >test-vm.xml

  test-vm
  
hvm
  
  32
  1
  

  

EOF

virsh define test-vm.xml
virsh start test-vm

$ virsh list
 Id   Name  State
-
 1test-vm   running

Delete the USB mouse device

DEVICE_ID=$(virsh qemu-monitor-command test-vm --hmp 'info qtree' | 
grep 'dev: usb-mouse' | cut -d'"' -f2)
virsh qemu-monitor-command test-vm --hmp "device_del $DEVICE_ID"

Back to GDB

Thread 25 "qemu-event" hit Breakpoint 2, 0x7f6179ed20a7 in
processDeviceDeletedEvent (devAlias=, vm=0x7f61842f1020,
driver=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:3536

Add breakpoint to domain status XML save, and continue the thread above

b virDomainObjSave
t 25
c

Thread 25 "qemu-event" hit Breakpoint 3, virDomainObjSave
(obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460
"/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Check the backtrace of the domain status XML save function, coming from
device deleted event

(gdb) bt
#0  virDomainObjSave (obj=0x7f61842f1020, xmlopt=0x7f6184028010, 
statusDir=0x7f6184035460 "/run/libvirt/qemu") at 
../../src/conf/domain_conf.c:28879
#1  0x7f6179eb68c3 in qemuDomainObjSaveStatus 
(driver=0x7f6184035e20, obj=0x7f61842f1020) at ../../src/qemu/qemu_domain.c:5801
#2  0x7f6179ed2159 in processDeviceDeletedEvent 
(devAlias=0x7f617c0073e0 "input0", vm=0x7f61842f1020, driver=0x7f6184035e20) at 
../../src/qemu/qemu_driver.c:3557
#3  qemuProcessEventHandler (data=0x7f617c0072b0, 
opaque=0x7f6184035e20) at ../../src/qemu/qemu_driver.c:4184
#4  0x7f61974fc983 in virThreadPoolWorker (opaque=) 
at ../../src/util/virthreadpool.c:164
#5  0x7f61974fb4d9 in virThreadHelper (data=) at 
../../src/util/virthread.c:241
#6  0x7f6196e64ac3 in start_thread (arg=) at 
./nptl/pthread_create.c:442
#7  0x7f6196ef6850 in clone3 () at 
../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Leave the thread at this point

Let's trigger the shutdown path

First, increase the shutdown timer (30 seconds is too fast for me; use
30 minutes)

(gdb) b virEventAddTimeout

$ sudo kill $(pidof libvirtd)

Thread 1 "libvirtd" hit Breakpoint 4, virEventAddTimeout
(timeout=3, cb=0x7f61975bbbc0 ,
opaque=0x55aec684a020, ff=0x0) at ../../src/util/virevent.c:148

t 1
set $rdi = 30 * 60 * 1000

(gdb) i r $rdi
rdi0x1b7740180

Now, skip the qemu driver shutdown wait path, to force the scenario
(unexpected) that it allows a race condition:

b qemuStateShutdownWait
c

Thread 26 "daemon-shutdown" hit Breakpoint 5,
qemuStateShutdownWait () at ../../src/qemu/qemu_driver.c:1055

t 26
ret
c

Thread 1 "libvirtd" hit Breakpoint 1, qemuStateCleanup () at 
../../src/qemu/qemu_driver.c:1070

Check there are 2 threads: cleanup and domain status XML save

(gdb) i th
  Id   Target Id Frame
  1Thread 0x7f6193934ac0 (LWP 2544) "libvirtd"   qemuStateCleanup 
() at ../../src/qemu/qemu_driver.c:1070
  18   Thread 0x7f616a7fc640 (LWP 2563) "gmain"  (running)
  19   Thread 0x7f6169ffb640 (LWP 2564) "gdbus"  (running)
  20   Thread 0x7f61697fa640 (LWP 2565) "udev-event" (running)
  24   Thread 0x7f616affd640 (LWP 2641) "vm-test-vm" (running)
  25   Thread 0x7f61687f8640 (LWP 2660) "qemu-event" virDomainObjSave 
(obj=0x7f61842f1020, xmlopt=0x7f6184028010, statusDir=0x7f6184035460 
"/run/libvirt/qemu") at ../../src/conf/domain_conf.c:28879

Confirm the qemu driver's domain xml formatter/options is
set/referenced:

t 25

(gdb) p xmlopt.privateData.format
$1 =