Comments in-line. On 02/04/13 13:12, Stefan Hajnoczi wrote: > Use g_poll(3) instead of select(2). Well, this is kind of a cheat. > It's true that we're now using g_poll(3) on POSIX hosts but the *_fill() > and *_poll() functions are still using rfds/wfds/xfds. > > We've set the scene to start converting *_fill() and *_poll() functions > step-by-step until no more rfds/wfds/xfds users remain. Then we'll drop > the temporary gpollfds_from_select() and gpollfds_to_select() functions > and be left with native g_poll(2). > > On Windows things are a little crazy: convert from rfds/wfds/xfds to > GPollFDs, back to rfds/wfds/xfds, call select(2), rfds/wfds/xfds back to > GPollFDs, and finally back to rfds/wfds/xfds again. This is only > temporary and keeps the Windows build working through the following > patches. We'll drop this excessive conversion later and be left with a > single GPollFDs -> select(2) -> GPollFDs sequence that allows Windows to > use select(2) while the rest of QEMU only knows about GPollFD. > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > --- > main-loop.c | 135 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 127 insertions(+), 8 deletions(-)
I assume this is complex because the pre-patch situation is overly complex as well: slirp -> fd_set iohandler -> fd_set unix windows --------------------------------- -------------------------------------------- before after before after -------------- ----------------- --------------------- --------------------- glib -> fd_set glib -> fd_set glib -> GPollFD glib -> GPollFD fd_set -> GPollFD WaitObj -> GPollFD WaitObj -> GPollFD SELECT G_POLL G_POLL (glib/WaitObj) G_POLL (glib/WaitObj) fd_set -> GPollFD GPollFD -> fd_set SELECT (slirp/ioh.) SELECT (slirp/ioh.) (I'm ignoring the after-select / after-poll operations; they (should) mirror the pre-select / pre-poll ones.) No idea why the windows version has been mixing g_poll() and select(). I'd hope that this series kills select() for uniformity's sake, but the 00/10 subject and the commit msg for this patch indicate otherwise. "main-loop.c" is full of global variables which makes it a *pain* to read. > > diff --git a/main-loop.c b/main-loop.c > index d0d8fe4..f1dcd14 100644 > --- a/main-loop.c > +++ b/main-loop.c > @@ -117,6 +117,8 @@ void qemu_notify_event(void) > aio_notify(qemu_aio_context); > } > > +static GArray *gpollfds; > + > int qemu_init_main_loop(void) > { > int ret; > @@ -133,6 +135,7 @@ int qemu_init_main_loop(void) > return ret; > } > > + gpollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD)); > qemu_aio_context = aio_context_new(); > src = aio_get_g_source(qemu_aio_context); > g_source_attach(src, NULL); > @@ -146,6 +149,62 @@ static GPollFD poll_fds[1024 * 2]; /* this is probably > overkill */ > static int n_poll_fds; > static int max_priority; > > +/* Load rfds/wfds/xfds into gpollfds. Will be removed a few commits later. > */ > +static void gpollfds_from_select(void) > +{ > + int fd; > + for (fd = 0; fd <= nfds; fd++) { "nfds" is an example of a global variable being global ("file scope") for no good reason. Anyway it does seem to have inclusive meaning ("highest file descriptor in all sets"). > + int events = 0; > + if (FD_ISSET(fd, &rfds)) { > + events |= G_IO_IN | G_IO_HUP | G_IO_ERR; > + } ("=" would have sufficed) > + if (FD_ISSET(fd, &wfds)) { > + events |= G_IO_OUT | G_IO_ERR; > + } > + if (FD_ISSET(fd, &xfds)) { > + events |= G_IO_PRI; > + } > + if (events) { > + GPollFD pfd = { > + .fd = fd, > + .events = events, > + }; > + g_array_append_val(gpollfds, pfd); > + } > + } > +} (I don't like "gpollfds" being global, but) this seems OK. It matches the glib docs and also How the Mapping Should Be Done (TM). This function corresponds to the "unix / after / fd_set -> GPollFD" entry of the diagram, and as such it is composed with "glib -> fd_set" (glib_select_fill()). I'll note that glib_select_fill() sets "xfds" for G_IO_ERR but not G_IO_PRI. (According to SUSv4, in our case xfds is the logical-or of "error pending" and "OOB/urgent pending".) We can assume that each entry stored by g_main_context_query() will have at least one of IN and OUT set in "events". Further assuming that glib follows its own documentation, that implies that G_IO_ERR will be set unconditionally (because it always accompanies each of IN and OUT). glib_select_fill() will map that to xfds, which we then map here to G_IO_PRI. The total effect is that G_IO_PRI is set on all file descriptors, always, even if we only try to write. I think ultimately support for OOB data should be killed throughout, as a policy, including xfds & G_IO_PRI. Pending errors are returned by read()/write()/etc just fine; see eg. libvirt commit d19149dd. > + > +/* Store gpollfds revents into rfds/wfds/xfds. Will be removed a few commits > + * later. > + */ > +static void gpollfds_to_select(int ret) > +{ > + int i; > + > + FD_ZERO(&rfds); > + FD_ZERO(&wfds); > + FD_ZERO(&xfds); > + > + if (ret <= 0) { > + return; > + } ie. for g_poll() timeouts, errors & signal interrupts, we clear the sets and that's it. OK. > + > + for (i = 0; i < gpollfds->len; i++) { > + int fd = g_array_index(gpollfds, GPollFD, i).fd; > + int revents = g_array_index(gpollfds, GPollFD, i).revents; > + > + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) { > + FD_SET(fd, &rfds); > + } > + if (revents & (G_IO_OUT | G_IO_ERR)) { > + FD_SET(fd, &wfds); > + } > + if (revents & G_IO_PRI) { > + FD_SET(fd, &xfds); > + } > + } > +} Looks good. > + > #ifndef _WIN32 > static void glib_select_fill(int *max_fd, fd_set *rfds, fd_set *wfds, > fd_set *xfds, uint32_t *cur_timeout) > @@ -212,22 +271,22 @@ static void glib_select_poll(fd_set *rfds, fd_set > *wfds, fd_set *xfds, > > static int os_host_main_loop_wait(uint32_t timeout) > { > - struct timeval tv, *tvarg = NULL; > int ret; > > glib_select_fill(&nfds, &rfds, &wfds, &xfds, &timeout); > > - if (timeout < UINT32_MAX) { > - tvarg = &tv; > - tv.tv_sec = timeout / 1000; > - tv.tv_usec = (timeout % 1000) * 1000; > - } > - > if (timeout > 0) { > qemu_mutex_unlock_iothread(); > } > > - ret = select(nfds + 1, &rfds, &wfds, &xfds, tvarg); > + /* We'll eventually drop fd_set completely. But for now we still have > + * *_fill() and *_poll() functions that use rfds/wfds/xfds. > + */ > + gpollfds_from_select(); > + > + ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); This exploits for the timeout parameter that (int)UINT32_MAX equals -1 on all supported platforms. Timeout values in [ INT_MAX+1, UINT_MAX32-1 ] lose their meaning, but I guess that's no problem in practice. > + > + gpollfds_to_select(ret); > > if (timeout > 0) { > qemu_mutex_lock_iothread(); > @@ -327,6 +386,55 @@ void qemu_fd_register(int fd) > FD_CONNECT | FD_WRITE | FD_OOB); > } > > +static int pollfds_fill(GArray *pollfds, fd_set *rfds, fd_set *wfds, > + fd_set *xfds) > +{ > + int nfds = -1; "nfds" shadows the global "nfds". The return value of pollfds_fill() is then assigned to the global "nfds". Not very elegant (but it's all rooted in the existense of the global nfds). > + int i; > + > + for (i = 0; i < pollfds->len; i++) { > + GPollFD *pfd = &g_array_index(pollfds, GPollFD, i); > + int fd = pfd->fd; > + int events = pfd->events; > + if (events & (G_IO_IN | G_IO_HUP | G_IO_ERR)) { > + FD_SET(fd, rfds); > + nfds = MAX(nfds, fd); > + } > + if (events & (G_IO_OUT | G_IO_ERR)) { > + FD_SET(fd, wfds); > + nfds = MAX(nfds, fd); > + } > + if (events & G_IO_PRI) { > + FD_SET(fd, xfds); > + nfds = MAX(nfds, fd); > + } > + } > + return nfds; > +} > + > +static void pollfds_poll(GArray *pollfds, int nfds, fd_set *rfds, > + fd_set *wfds, fd_set *xfds) > +{ The "nfds" parameter both shadows the global variable and is unused in the function. > + int i; > + > + for (i = 0; i < pollfds->len; i++) { > + GPollFD *pfd = &g_array_index(pollfds, GPollFD, i); > + int fd = pfd->fd; > + int revents = 0; > + > + if (FD_ISSET(fd, rfds)) { > + revents |= G_IO_IN | G_IO_HUP | G_IO_ERR; > + } ("=" would have sufficed) > + if (FD_ISSET(fd, wfds)) { > + revents |= G_IO_OUT | G_IO_ERR; > + } > + if (FD_ISSET(fd, xfds)) { > + revents |= G_IO_PRI; > + } > + pfd->revents |= revents & pfd->events; I find this |= vs. = more confusing than the other two. "pfd->revents" is zero here. > + } > +} > + > static int os_host_main_loop_wait(uint32_t timeout) > { > GMainContext *context = g_main_context_default(); > @@ -382,12 +490,22 @@ static int os_host_main_loop_wait(uint32_t timeout) > * improve socket latency. > */ > > + /* This back-and-forth between GPollFDs and select(2) is temporary. > We'll > + * drop it in a couple of patches, I promise :). > + */ > + gpollfds_from_select(); > + FD_ZERO(&rfds); > + FD_ZERO(&wfds); > + FD_ZERO(&xfds); > + nfds = pollfds_fill(gpollfds, &rfds, &wfds, &xfds); > if (nfds >= 0) { > select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0); > if (select_ret != 0) { > timeout = 0; > + pollfds_poll(gpollfds, nfds, &rfds, &wfds, &xfds); This function shouldn't be invoked for "select_ret == -1" (in case that's possible here); "revents" will end up a copy of "events": "On failure, the objects pointed to by the readfds, writefds, and errorfds arguments shall not be modified." > } > } > + gpollfds_to_select(select_ret || g_poll_ret); I can see this logical-or mimics the return value below; however I think it's not robust. If one of the operands is -1, and the other is non-positive, then gpollfds_to_select() should not traverse the GPollFDs, but it will. ... Actually, "g_poll_ret" should have absolutely no effect on gpollfds_to_select(): "g_poll_ret" corresponds to the "poll_fds" array, which covers the glib-internal file descriptors and the WaitObjects. Here (= in the windows case), "gpollfds" covers only slirp & iohandler, thus only "select_ret" should be passed in. > > return select_ret || g_poll_ret; I find this return value unfathomable (how is (-1||0) equivalent to (1||1)?), but it's out of scope for now. > } > @@ -403,6 +521,7 @@ int main_loop_wait(int nonblocking) > } > > /* poll any events */ > + g_array_set_size(gpollfds, 0); /* reset for new iteration */ > /* XXX: separate device handlers from system ones */ > nfds = -1; > FD_ZERO(&rfds); I can't decide (yet) if any of the above is important; probably not. >From a quick look at the tree at the series' end, most of it seems to be gone by then. I'll let you decide. Reviewed-by: Laszlo Ersek <ler...@redhat.com>