Something I overlooked replying to on this thread;

> BTW, I remember you said that you fixed the busy loop by disabling the
> FD in the speculative event cache, but do you remember how you re-enable
> it ? Eg, if all other processes have accepted some connections, your
> first process will have to accept new connections again, so that means
> that its state depends on others'.

  We initially just returned from listener_accept(). This caused us to
go into a busy spin as there were always pending speculative reads, so
fd_nbspec was non zero in ev_epoll.c which triggered setting
wait_time=0. 

  Looking at the flow in listener_accept(), what we observed happening
before was that without any of our patches, several processes would wake
up on a new socket event. The fastest would win and accept() and the
slower ones would hit the error check in listener.c at line 353. 
353:   if (unlikely(cfd == -1))  
             switch (errno) {
                 case EAGAIN:
                 case EINTR:
                 case ECONNABORTED:
                      fd_poll_recv(fd);
                      return;   /* nothing more to accept */
             :
  In this case, chasing fd_poll_recv(fd) through the files indicated it
cleared the speculative events off the queue, meaning fd_nbspec would
not be set, and wait_time would not get set to 0. 

  So we just added the same call to the shm patch refusal path. Which
solved our problem. 

  Not sure how that relates to your point about the processes state
depending on others, which does not seem to be the case. 

  Hopefully that's not too late. 

  Andy 
 
-- 
Andrew Phillips
Director Technical Operations

Direct: +44 (0)203 192 2509
Mobile: +44 (0)7595 242 900

LMAX, Yellow Building, 1A Nicholas Road,  London, W11 4AN



Reply via email to