tornado.web ioloop add_timeout eats CPU

Laszlo Nagy Sun, 02 Sep 2012 23:35:47 -0700

JavaScript clients (browsers) do long poll requests. Each request cantake up to 10 seconds before the server responds. On the server side,every client has a queue of messages that needs to be sent to theclient. When the long poll request comes in, the server checks if thereare messages to be sent out. If there are no outgoing messages, then itdoes not finish the response, but calls ioloop's add_timeout method fordoing further checks. After 10 seconds (if there are no new messages)the server returns 304/not modified. If there is a message, then it issent back to the client as fast as possible, and the client comes backwith another long poll immediately.

These message queues are used for UI updates and also for instantmessaging. UI must be responsive. For this reason, any message in theoutgoing queue should be sent out to the client within 0.1 seconds.Sometimes (rarely) lots of messages arrive quickly, and in those casesit would be good to send them out even faster. What I did is that in thefirst 0.1 seconds, I call add_timeout with 0.01 seconds. So if theoutgoing queue is full of messages, then they are delivered quickly.After 0.1 seconds lapsed, add_timeout is called with 0.1 sec parameter.So the server load is reduced because most clients are inactive, andthey are going to get callbacks in every 0.1 sec.


Here are the two most important methods of my request handler:

    @tornado.web.asynchronous
    def post(self):
        """Handle POST requests."""
        # Disable caching
        self.set_header("Cache-Control","no-cache, must-revalidate")
        self.set_header("Expires","Mon, 26 Jul 1997 05:00:00 GMT")
        self.poll_start = time.time()
        action = self.get_argument("action")
        if action=="poll":
            self.poll()
        elif action=="message":
            self.process_incoming(self.get_argument("message"))
        else:
            self.set_status(400)
            self.finish()

    def poll(self):
        """Handle POLL request for the browser's message loop.

        This method monitors the outgoing message queue, and sends
        new messages to the browser when they come in (or until
        self.poll_interval seconds elapsed)."""
        poll_elapsed = time.time() - self.poll_start
        if poll_elapsed<0.1:
            poll_step = 0.01
        else:
            poll_step = 0.1
        if poll_elapsed<self.poll_interval:
            if self.session.outgoing:
                msg = self.session.outgoing.pop()
                self.write(msg)
                self.finish()
            else:
                tornado.ioloop.IOLoop.instance().add_timeout(
                    time.time()+poll_step,self.poll)
        else:
            self.set_status(304)
            self.finish()

And here is my problem. If I point 5 browsers to the server, then I get2% CPU load (Intel i5 2.8GHz on amd64 Linux). But why? Most of the time,the server should be sleeping. cProfile tells this:


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  845.146  845.146 <string>:1(<module>)

1135775 832.283 0.001 832.283 0.001 {method 'poll' of'select.epoll' objects}

I have copied out the two relevant rows only. As you can see, totalruntime was 845 seconds, and 832 seconds were spent in "epoll".Apparently, CPU load goes up linearly as I connect more clients. Itmeans that 50 connected clients would do 20% CPU load. Which isridiculous, because they don't do anything but wait for messages to beprocessed. Something terribly wrong, but I cannot figure out what?

Actually I could not try this with 50 clients. If I open 15 clients,then the server starts dropping connections. (Tried from Firefox andChrome too.) If I change the poll() method this way:


    else:
            print "No messages after %.2f seconds"%poll_elapsed
            self.set_status(304)
            self.finish()

then I see this in the log:

No messages after 10.01 seconds
ERROR:root:Uncaught exception POST /client (127.0.0.1)

HTTPRequest(protocol='http', host='127.0.0.1:8888', method='POST',uri='/client', version='HTTP/1.1', remote_ip='127.0.0.1',body='_xsrf=df157469a62142d7b28c5a4880dd8478&action=poll',headers={'Referer': 'http://127.0.0.1:8888/', 'Content-Length': '50','Accept-Language': 'en-us;q=0.8,en;q=0.5', 'Accept-Encoding': 'gzip,deflate', 'Host': '127.0.0.1:8888', 'Accept': '*/*', 'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101Firefox/15.0', 'Connection': 'keep-alive', 'X-Requested-With':'XMLHttpRequest', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache','Cookie':'sid="MS1acHd5b3V1WHFOQU1BbTVmSXJEeVhkLys=|1346652787|e045d786fdb89b73220a2c77ef89572d0c16901e";_xsrf=df157469a62142d7b28c5a4880dd8478;xsfr=df157469a62142d7b28c5a4880dd8478', 'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8'})

Traceback (most recent call last):

File "/usr/lib/python2.7/dist-packages/tornado/stack_context.py",line 183, in wrapped

    callback(*args, **kwargs)
  File "/home/gandalf/Python/Projects/test/client.py", line 67, in poll
    self.finish()

File "/usr/lib/python2.7/dist-packages/tornado/web.py", line 641, infinish

    self.request.finish()

File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line411, in finish

    self.connection.finish()

File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line186, in finish

    self._finish_request()

File "/usr/lib/python2.7/dist-packages/tornado/httpserver.py", line213, in _finish_request

    self.stream.read_until(b("\r\n\r\n"), self._header_callback)

File "/usr/lib/python2.7/dist-packages/tornado/iostream.py", line151, in read_until

    self._check_closed()

File "/usr/lib/python2.7/dist-packages/tornado/iostream.py", line493, in _check_closed

    raise IOError("Stream is closed")
IOError: Stream is closed

What is even more interesting is that on the client side, Firebug tellsme that the connection was dropped after 15 seconds. The JavaScriptmessage loop is operated by a jQuery AJAX call, that has timeout=15000parameter given:


<snip>
    that.messageLoop = function () {
        var xsfr = that.readCookie("xsfr");
        $.ajax({
          url: '/client',
          type: "POST",
          data: {"_xsrf":xsfr,"action":"poll"},
          async: true,
          cache: false,
          timeout: 15000,
          error: function (data) {
                setTimeout( that.messageLoop , 1000);
                console.log(data);
          },
          success: function (data) {
            if (data) {
                try {
                    eval(data);
                }
                catch(err) {
                  $().toastmessage('showErrorToast', err.message);
                }
            }
            setTimeout( that.messageLoop , 10);
          }
        });
    };
</snip>

But on the server side, I see that the connection was dropped after10.01 seconds. If I start 20 clients, then about every second pollrequest gets dropped.


Thanks,

   Laszlo


--
http://mail.python.org/mailman/listinfo/python-list

tornado.web ioloop add_timeout eats CPU

Reply via email to