bug in epoll affecting libev

2008-10-27 Thread Marc Lehmann
I just found another bug in epoll that libev (currently) does not work
around: epoll_wait sometimes returns readyness notifications for the
"an older" fd - when an fd gets closed and a new one gets allocated, a
subsequent epoll_wait might receive a spurious readyness notification for
that fd.

This has already been noted in the documentatioon, unfortunately, I found
a case where this is hard to work around: when one does a non-blockign
connect on an fd one cannot find out easily whether the "writable"
indicates a connect error or is just a spurious notification.

This is just a heads-up, I will probably implement a simple generation
counter using the unused 8 bits in the ANFD structure to work around this
bug.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev


Re: bug in epoll affecting libev

2008-10-27 Thread Marc Lehmann
On Mon, Oct 27, 2008 at 10:31:19AM +0100, Marc Lehmann <[EMAIL PROTECTED]> 
wrote:
> I just found another bug in epoll that libev (currently) does not work

It's verified, and the extent is much larger than previously assumed: on
SMP systems, busy servers can get hundreds of spurious event notificaitons
for different/already closed file descriptors. It doesn't happen on
uniprocessor systems (or when I disable all my extra cores).

The current CVS version of libev contains a workaround - it uses an 8-bit
generation counter to filter out those bogus events, which is likely
enough (even under bad conditions, the event generation counter was always
off by one at max).

Fortunately, the ANFD structure did have 8 bits left for this purpose.

If you have a busy daemon doing lots of connects and you get spurious
ready notifications during connect only with epoll, current CVS might fix
your problem.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev


Re: bug in epoll affecting libev

2008-10-28 Thread Kandalintsev Alexandre

I just found another bug in epoll

Did you try to contact kernel developers about this problem?

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev


Re: bug in epoll affecting libev

2008-10-28 Thread Marc Lehmann
On Tue, Oct 28, 2008 at 12:45:07PM +0300, Kandalintsev Alexandre <[EMAIL 
PROTECTED]> wrote:
> >I just found another bug in epoll
> Did you try to contact kernel developers about this problem?

The kernel developers are not interested in fixing epoll.

(I tried a year ago for similar issues, fork races, and the samba and
postfix authors were similarly being ignored, so on linux, we have the
choice between the totally misdesigned epoll API or select/poll, I wish
for somethign comparatively sane as kqueue).

I realised we need some kind of token protection anyways in case of fork
- basically, with epoll, you have to recreate the epoll handle in both
parent and child after a fork (or do even slower things before the fork).

Now, libev did explain that you can get spurious notificagtions (solaris
event ports cause those too, for example), the crucial difference is that
I thought it would be possible to cope with them easily in application
code, but recently realised you can't do that portably for nonblocking
connects (maybe not even nonportably - I don't know how to check for
whether a connection is currently in progress or not except by trying
another connect or so).

To me, the remaining question is just whether I should extend the ANFD
structure with 32 bits of generation count, or whether I can go with the 8
bit currently implemented (which are free): Extending a structure from 8
to 12 bytes sucks, especially on ia32.

I also think about about forcing the use of pthread_atfork (or the
internal __register_atfork) on linux, to catch forks for the parent (the
child is easy, getpid has virtually no cost, so forcing a getpid call per
loop iteration would not concenr me at all, but that doesn't work for the
parent process).

I also still think of ways on how to avoid the epoll set recreation in the
parent (for 1 fds this means 10001 syscalls...), but I can't come up
with anything that would be practically useful.

-- 
The choice of a   Deliantra, the free code+content MORPG
  -==- _GNU_  http://www.deliantra.net
  ==-- _   generation
  ---==---(_)__  __   __  Marc Lehmann
  --==---/ / _ \/ // /\ \/ /  [EMAIL PROTECTED]
  -=/_/_//_/\_,_/ /_/\_\

___
libev mailing list
libev@lists.schmorp.de
http://lists.schmorp.de/cgi-bin/mailman/listinfo/libev