Re: need some help with tcp/ip programming

Nadav Har'El Tue, 15 May 2007 05:35:05 -0700

On Mon, May 14, 2007, guy keren wrote about "Re: need some help with tcp/ip 
programming":
> >this is interesting. can anyone provide more info on this?
> the problem with select, is that it is unable to optimize handling of 
> 'holes' in the file descriptor set.
> suppose that you need to select on file descriptors 2 and 4000.
> you need to pass info about all file descriptors up to 4000 (i.e. many 
> '0' bits, and only two '1' bits, in the different select sets).
> with poll, you pass an array of the descriptors you care about. so the 
> size of the array is proportional to the amount of descriptors you are 
> interested in, while with select it is proportional to the numeric value 
> of the largest descriptor you are interested in.


This is indeed an accurate accessment of the difference between pol and
select. Another point worth noticing is that in select(), the same array
is used both as input and output. This means that after every event, you
need to refill this array, which can be quite slow if you have many thousands
of events per second.

In some cases, you reach a point where you are listening to thousands
of file descriptors, and getting thousands of events per second. For example,
one can write a single-threaded HTTP server which handles thousands of
concurrent connections with amazing performance (I wrote such a server
once, and it was a really interesting experience). When you reach such
high demands, even poll() is not good enough - every time one fd is ready
to act on, something which can happen thousands of times per second - you
need to call the poll() system call again, and pass to the long array of
fds from userspace to the kernel. The problem is that the time poll()
takes is proportional to the number of fds to poll, rather than to the
number of fds in which something actually happened.

To solve this problem, the "/dev/epoll" interface was added to Linux in
2001, and later, apparently because Linus Torvalds doesn't like /dev
tricks, new system calls were added instead (see epoll(4))).

It was amazing to see a system on which Apache stuggled to keep 200
concurrent open connections, suddenly keep thousands of concurrent
open connections, using only one thread (or N threads in a machine
with N cpus). Together with sendfile(), this allows you to create
killer Web servers :-)

> when you use poll, you can use the trick of having 2 theads - one polls 
> on "idle" sockets (i.e. sockets that did not have I/O in the last X 
> seconds), and one listens on 'active' sockets (i.e. sockets that had I/O 
> in the last X seconds). this avoids the major problem with both select 
> and poll - that after an event on a single socket, the info for all the 
> sockets has to be copied to user space (when select/poll returns), and 
> then to kernel space again (when invoking poll/select again).
> 
> i think that people added epoll support in order to avoid waking the 
> poll function altogether - by receiving a signal form the kernel with 
> the exact info, instead of having to return from poll.

Indeed (see my above explanation). epoll() *does* return from the poll
every time, but it immediately lets you know what changed (no need to
check a long array), and more importantly - when you want to call epoll()
again there is no need to pass the long list of fds to the kernel again.

-- 
Nadav Har'El                        |      Tuesday, May 15 2007, 27 Iyyar 5767
[EMAIL PROTECTED]             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |Ms Piggy's last words: "I'm pink,
http://nadav.harel.org.il           |therefore I'm ham."

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: need some help with tcp/ip programming

Reply via email to