I looked through the socket SO_BUSY_POLL and blk_mq poll support in recent Linux kernels with an eye towards integrating the ongoing QEMU polling work. The main missing feature is eventfd polling support which I describe below.
Background ---------- We're experimenting with polling in QEMU so I wondered if there are advantages to having the kernel do polling instead of userspace. One such advantage has been pointed out by Christian Borntraeger and Paolo Bonzini: a userspace thread spins blindly without knowing when it is hogging a CPU that other tasks need. The kernel knows when other tasks need to run and can skip polling in that case. Power management might also benefit if the kernel was aware of polling activity on the system. That way polling can be controlled by the system administrator in a single place. Perhaps smarter power saving choices can also be made by the kernel. Another advantage is that the kernel can poll hardware rings (e.g. NIC rx rings) whereas QEMU can only poll its own virtual memory (including guest RAM). That means the kernel can bypass interrupts for devices that are using kernel drivers. State of polling in Linux ------------------------- SO_BUSY_POLL causes recvmsg(2), select(2), and poll(2) family system calls to spin awaiting new receive packets. From what I can tell epoll is not supported so that system call will sleep without polling. blk_mq poll is mainly supported by NVMe. It is only available with synchronous direct I/O. select(2), poll(2), epoll, and Linux AIO are therefore not integrated. It would be nice to extend the code so a process waiting on Linux AIO using io_getevents(2), select(2), poll(2), or epoll will poll. QEMU and KVM-specific polling ----------------------------- There are a few QEMU/KVM-specific items that require polling support: QEMU's event loop aio_notify() mechanism wakes up the event loop from a blocking poll(2) or epoll call. It is used when another thread adds or changes an event loop resource (such as scheduling a BH). There is a userspace memory location (ctx->notified) that is written by aio_notify() as well as an eventfd that can be signalled. kvm.ko's ioeventfd is signalled upon guest MMIO/PIO accesses. Virtio devices use ioeventfd as a doorbell after new requests have been placed in a virtqueue, which is a descriptor ring in userspace memory. Eventfd polling support could look like this: struct eventfd_poll_info poll_info = { .addr = ...memory location..., .size = sizeof(uint32_t), .op = EVENTFD_POLL_OP_NOT_EQUAL, /* check *addr != val */ .val = ...last value..., }; ioctl(eventfd, EVENTFD_SET_POLL, &poll_info); In the kernel, eventfd stashes this information and eventfd_poll() evaluates the operation (e.g. not equal, bitwise and, etc) to detect progress. Note that this eventfd polling mechanism doesn't actually poll the eventfd counter value. It's useful for situations where the eventfd is a doorbell/notification that some object in userspace memory has been updated. So it polls that userspace memory location directly. This new eventfd feature also provides a poor man's Linux AIO polling support: set the Linux AIO shared ring index as the eventfd polling memory location. This is not as good as true Linux AIO polling support where the kernel polls the NVMe, virtio_blk, etc ring since we'd still rely on an interrupt to complete I/O requests. Thoughts? Stefan
signature.asc
Description: PGP signature