What's wrong is the 1,135,775 calls to "method 'poll' of
'select.epoll' objects".
I was affraid you are going to say that. :-)
With five browsers waiting for messages over 845 seconds, that works
out to each  waiting browser inducing 269 epolls per second.

Almost equally important is what the problem is *not*. The problem is
*not* spending the vast majority of time in epoll; that's *good* news.
The problem is *not* that CPU load goes up linearly as we connect more
clients. This is an efficiency problem, not a scaling problem.

So what's the fix? I'm not a Tornado user; I don't have a patch.
Obviously Laszlo's polling strategy is not performing, and the
solution is to adopt the event-driven approach that epoll and Tornado
do well.
Actually, I have found a way to overcome this problem, and it seems to be working. Instead of calling add_timeout from every request, I save the request objects in a list, and operate a "message distributor" service in the background that routes messages to clients, and finish their long poll requests when needed. The main point is that the "message distributor" has a single entry point, and it is called back at given intervals. So the number of callbacks per second does not increase with the number of clients. Now the CPU load is about 1% with one client, and it is the same with 15 clients. While the response time is the same (50-100msec). It is efficient enough for me.

I understand that most people do a different approach: they do a fast poll request from the browser in every 2 seconds or so. But this is not good for me, because then it can take 2 seconds to send a message from one browser into another that is not acceptable in my case. Implementing long polls with a threaded server would be trivial, but a threaded server cannot handle 100+ simultaneous (long running) requests, because that would require 100+ threads to be running.

This central "message distributor" concept seems to be working. About 1-2% CPU overhead I have to pay for being able to send messages from one browser into another within 100msec, which is fine.

I could have not done this without your help.

Thank you!

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to