David Schwartz a écrit :
6) Epoll removes the file from the set, when the *kernel* object gets
closed (internal use-count goes to zero)
With that in mind, how can the code snippet above trigger a removal from
the epoll set?
I don't see how that can be. Suppose I add fd 8 to an epoll set.
Suppose fd
5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set,
since there no longer is an fd 8? Events on files registered for epoll
notification are reported by descriptor, so the set membership has to be
associated (as reflected into userspace) with the descriptor, not the file.
Events are not necessarly reported "by descriptors". epoll uses an opaque
field provided by the user.
It's up to the user to properly chose a tag that will makes sense if the user
app is playing dup()/close() games for example.
typedef union epoll_data
{
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;
It's true some applications are using 'fd' field from epoll_data_t, but in
this case they should not play dup()/close() games that could change the
meaning of their 'epoll tags'. They would better use 'ptr/u64' for example to
map the event to an application object. In this object they might find the
correct handle (fd) to communicate with the kernel for a given 'file'. This
handle could then be remapped to another handle using dup()/fcntl()/close()...
For example, consider:
1) Process creates an epoll set, the set gets fd 4.
2) Process creates a socket, it gets fd 5.
3) The process adds fd 5 to set 4.
4) The process forks.
5) The child inherits the epoll set but not the socket.
Here the kernel cannot quite do the right thing. Ideally, the parent
would
still have fd 5 in its version of the epoll set. After all, it has not
closed fd 5. However, the child *cannot* see fd 5 in its version of the
epoll set since it has no fd 5. An event reported for fd 5 would be
nonsense.
Yes, it would be nonsense that the child still tries to get events from the
epoll set while he cannot possibly use the socket. If you use 'ptr' field to
retrieve an object, this object probably would have no meaning in the child
anyway, especially after an exec() syscall.
That kind of user error can also happens with select()/poll(), if you do for
example :
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
select(fd+1,&fdset, NULL, NULL, NULL);
newfd = dup(fd);
close(fd);
for (i = 0 ; i < maxfd ; i++)
if (FD_ISSET(i, &fdset))
read(i, ...)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/