Something I overlooked replying to on this thread; > BTW, I remember you said that you fixed the busy loop by disabling the > FD in the speculative event cache, but do you remember how you re-enable > it ? Eg, if all other processes have accepted some connections, your > first process will have to accept new connections again, so that means > that its state depends on others'.
We initially just returned from listener_accept(). This caused us to go into a busy spin as there were always pending speculative reads, so fd_nbspec was non zero in ev_epoll.c which triggered setting wait_time=0. Looking at the flow in listener_accept(), what we observed happening before was that without any of our patches, several processes would wake up on a new socket event. The fastest would win and accept() and the slower ones would hit the error check in listener.c at line 353. 353: if (unlikely(cfd == -1)) switch (errno) { case EAGAIN: case EINTR: case ECONNABORTED: fd_poll_recv(fd); return; /* nothing more to accept */ : In this case, chasing fd_poll_recv(fd) through the files indicated it cleared the speculative events off the queue, meaning fd_nbspec would not be set, and wait_time would not get set to 0. So we just added the same call to the shm patch refusal path. Which solved our problem. Not sure how that relates to your point about the processes state depending on others, which does not seem to be the case. Hopefully that's not too late. Andy -- Andrew Phillips Director Technical Operations Direct: +44 (0)203 192 2509 Mobile: +44 (0)7595 242 900 LMAX, Yellow Building, 1A Nicholas Road, London, W11 4AN