On 01/07/12 22:43, Michael S. Tsirkin wrote: > On Sun, Jul 01, 2012 at 09:06:20PM +1000, Alexey Kardashevskiy wrote: >> QEMU from qemu.org/master (not qemu-kvm), linux host 3.4, ppc64. >> >> Shortly the problem is in the host kernel: closing a file in one thread does >> not interrupt select() waiting on the same file description in another >> thread. >> >> Longer story is: >> I'll use VFIO as an example as I hit this when I was debugging it but VFIO >> itself has nothing to do with the issue. >> >> The bug looks like: I start the guest with MSI-capable PCI card passed via >> VFIO. The guest does dhclient from .bashrc and this dhclient does not >> receive anything till I press any key. >> I did not see it for a while as I always start the guest and then typed >> "dhclient" manually in the guest console. >> >> What happens: >> VFIO initializes INTx eventfd via event_notifier_init() (returns fd=XX) and >> qemu_set_fd_handler() before the guest starts. Then QEMU starts the guest >> and enters a loop is os_host_main_loop_wait() which stays in select() until >> something happens. >> >> >From the host kernel prospective, the XX fd was allocated, struct file* >> >(P1) with eventfd specific private_data allocated and initialized. select() >> >added a file refcounter (called get_file() in __pollwait()) and started >> >waiting on file* P1 but not on fd's number XX (what is mmm weird but ok). >> >> The guest starts and tries to initialize MSI for the PCI card passed >> through. The guest does PCI config write and this write creates a second >> thread in QEMU. > > Why does this create a thread btw?
It is the thread where the guest is running I guess (?). This is the backtrace of this second thread: (gdb) bt #0 vfio_enable_msi (vdev=0x110200f0) at /home/aik/qemu-impreza/hw/ppc/../vfio_pci.c:699 #1 0x000000001036c948 in vfio_pci_write_config (pdev=0x110200f0, addr=0xd2, val=0x81, len=0x2) at /home/aik/qemu-impreza/hw/ppc/../vfio_pci.c:979 #2 0x00000000101b1388 in pci_host_config_write_common (pci_dev=0x110200f0, addr=0xd2, limit=0x100, val=0x81, len=0x2) at /home/aik/qemu-impreza/hw/pci_host.c:54 #3 0x000000001035d70c in finish_write_pci_config (spapr=0x10efa860, buid=0x2, addr=0xd2, size=0x2, val=0x81, rets=0xd64758) at /home/aik/qemu-impreza/hw/ppc/../spapr_pci.c:170 #4 0x000000001035d874 in rtas_ibm_write_pci_config (spapr=0x10efa860, token=0x2010, nargs=0x5, args=0xd64744, nret=0x1, rets=0xd64758) at /home/aik/qemu-impreza/hw/ppc/../spapr_pci.c:194 #5 0x0000000010360bcc in spapr_rtas_call (spapr=0x10efa860, token=0x2010, nargs=0x5, args=0xd64744, nret=0x1, rets=0xd64758) at /home/aik/qemu-impreza/hw/ppc/../spapr_rtas.c:218 #6 0x0000000010358150 in h_rtas (env=0xfffb733f040, spapr=0x10efa860, opcode=0xf000, args=0xfffb7fea030) at /home/aik/qemu-impreza/hw/ppc/../spapr_hcall.c:560 #7 0x0000000010358c4c in spapr_hypercall (env=0xfffb733f040, opcode=0xf000, args=0xfffb7fea030) at /home/aik/qemu-impreza/hw/ppc/../spapr_hcall.c:734 #8 0x00000000103dab2c in kvm_arch_handle_exit (env=0xfffb733f040, run=0xfffb7fea000) at /home/aik/qemu-impreza/target-ppc/kvm.c:769 #9 0x00000000103a8a94 in kvm_cpu_exec (env=0xfffb733f040) at /home/aik/qemu-impreza/kvm-all.c:1536 #10 0x00000000102ede24 in qemu_kvm_cpu_thread_fn (arg=0xfffb733f040) at /home/aik/qemu-impreza/cpus.c:752 #11 0x00000fffb7ab19f0 in .start_thread () from /lib64/libpthread.so.0 #12 0x00000fffb7706774 in .__clone () from /lib64/libc.so.6 > >> In this thread QEMU-VFIO closes fd XX which makes the host kernel release >> fd=XX, release the corresponding file* P1 (fput() in filp_close()) but this >> does not free this P1 as there is select() waiting on it. > > Doesn't qemu remove an fd handler before closing the fd? > If not that's the bug right there. QEMU does not literally remove it but it marks the event as deleted and actually removes it much later from qemu_iohandler_poll() which is called after os_host_main_loop_wait(). Removal is done by calling qemu_set_fd_handler(fd, NULL, NULL, vdev); But even if it did remove it, what would it change? > >> eventfd_release() could wake it up but it is called when its refcounter >> becomes 0 and this won't happen till select() interrupts. >> >> Doing MSI init stuff, QEMU-VFIO calls the same event_notifier_init() >> (returns recycled fd=XX what is correct but confuses) and >> qemu_set_fd_handler() which adds a handler but select() does not pick it up. >> The eventfd() (called by event_notifier_init()) allocates new struct file >> *P2 in the kernel, QEMU-VFIO passes this fd=XX to the kernel's VFIO. When >> MSI interrupt comes to the host kernel, the VFIO interrupt handler calls >> eventfd_signal() on the correct file* P2. However do_select() in the kernel >> does not interrupt to deliver eventfd event as it is still waiting on P1 - >> nobody interrupted select(). It can only interrupt on stdin events (like >> typing keys) and INTx interrupt (which does not happen as MSI is enabled). >> >> So we need to sync eventfd()/close() calls in one thread with select() in >> another. Below is the patch which simply sends SIGUSR2 to the main thread >> (the sample patch is below). Another solution could be adding new IO handler >> to wake select() up. Or to do something with the kernel but I am not sure >> what - implementing file_operations::flush for eventfd to do wakeup did not >> help and I did not dig any further. >> >> >> Does it make sense? What do I miss? What would be the right solution? >> >> >> --- >> iohandler.c | 1 + >> main-loop.c | 17 +++++++++++++++++ >> 2 files changed, 18 insertions(+) >> >> diff --git a/iohandler.c b/iohandler.c >> index 3c74de6..54f4c48 100644 >> --- a/iohandler.c >> +++ b/iohandler.c >> @@ -77,6 +77,7 @@ int qemu_set_fd_handler2(int fd, >> ioh->fd_write = fd_write; >> ioh->opaque = opaque; >> ioh->deleted = 0; >> + kill(getpid(), SIGUSR2); >> } >> return 0; >> } >> diff --git a/main-loop.c b/main-loop.c >> index eb3b6e6..f65e048 100644 >> --- a/main-loop.c >> +++ b/main-loop.c >> @@ -199,10 +199,27 @@ static int qemu_signal_init(void) >> } >> #endif >> >> +static void sigusr2_print(int signal) >> +{ >> +} >> + >> +static void sigusr2_init(void) >> +{ >> + struct sigaction action; >> + >> + memset(&action, 0, sizeof(action)); >> + sigfillset(&action.sa_mask); >> + action.sa_handler = sigusr2_print; >> + action.sa_flags = 0; >> + sigaction(SIGUSR2, &action, NULL); >> +} >> + >> int main_loop_init(void) >> { >> int ret; >> >> + sigusr2_init(); >> + >> qemu_mutex_lock_iothread(); >> ret = qemu_signal_init(); >> if (ret) { >> -- >> 1.7.10 -- Alexey