[google-appengine] Deadline exceeded errors, huge latency, very high number of instances, entities not deleting

Ben Tue, 17 Jan 2012 18:24:12 -0800

Over the past several weeks (months?) we've noticed an increased
amount of
DeadlineExceeded / Timeout Exceptions on our app, which involve a very
large delay (71476ms, 72460ms, 61287ms, 91551ms, etc) and often occur
on the smallest of queries... even requests that do not do any
database queries, such as static files. The appengine status page
shows there's never been a problem.


Here's an example of a crash that took 86168ms to fail, which was (as
far as I can tell) loading a css file:

    2012-01-17 13:29:21.745 /stylesheets/all.css 500 86168ms 0kb
Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/
534.46 (KHTML, like Gecko) Mobile/9A405

    66.87.122.80 - - [17/Jan/2012:13:29:21 -0800] "GET /stylesheets/
all.css HTTP/1.1" 500 0 "http://www.[redacted].com/"; "Mozilla/5.0
(iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML,
like Gecko) Mobile/9A405" "www.[redacted].com" ms=86168 cpu_ms=93
api_cpu_ms=0 cpm_usd=0.002640 loading_request=1 pending_ms=9836
exit_code=104 instance=00c61b117c5cab753f44bfade2e47d4169dc

    E 2012-01-17 13:29:21.737

    <class 'google.appengine.runtime.DeadlineExceededError'>:
    Traceback (most recent call last):
      File "/base/data/home/apps/[redacted]/3-0-0.356168683702691497/
extension/extension.py", line 14, in <module>
        from extension.entry_application import EntryApplication
      File "/base/data/home/apps/[redacted]/3-0-0.356168683702691497/
extension/entry_application.py", line 9, in <module>
        from application_controller import ApplicationController
      File "/base/data/home/apps/[redacted]/3-0-0.356168683702691497/
extension/application_controller.py", line 11, in <module>
        from django.template import Context
      File "/base/python_runtime/python_lib/versions/third_party/
django-1.2/django/template/__init__.py", line 50, in <module>
        """

    I 2012-01-17 13:29:21.738

    This request caused a new process to be started for your
application, and thus caused your application code to be loaded for
the first time. This request may thus take longer and use more CPU
than a typical request for your application.

    W 2012-01-17 13:29:21.738

    A serious problem was encountered with the process that handled
this request, causing it to exit. This is likely to cause a new
process to be used for the next request to your application. If you
see this message frequently, you may be throwing exceptions during the
initialization of your application. (Error code 104)

---------------------

We are also experiencing what I would consider is not proper instance
ages... we rarely see any of our instances live for any longer than 10
minutes, as we hit these deadline errors constantly (we're serving
thousands of queries per second, so one of these errors are bound to
hit an instance eventually, thus killing it and needing a new one to
start up in its place). We used to be running (and paying for!) almost
1,000 instances at any one time, even though if we turn it off
Automatic, we only use around 200.

Another new problem that has started showing up in the last few days
has been content that suddenly duplicates many times over, and then
cannot be deleted... we execute an entity.delete(), but the entity is
still viewable. If I edit the entity by hand in the Datastore Viewer,
then run the same entity.delete(), it says the entity is gone in the
Datastore Viewer, but then only a few seconds/minutes later it has
returned.

I am extremely concerned by these timeout/instance problems for a few
reasons:
1) we're paying a lot for this service
2) the processes that are timing out and killing instances are many
many times more expensive than properly-executing queries that we are
running on our site. I'm willing to pay for a query I made that takes
2000ms to execute, but not an instance loading process that I don't
have control of that dies after taking 90000ms.
3) I'm seeing a lot of discussion threads documenting issues which are
very similar to our problem, and haven't seen any Google response
aside from replies of "switch over to the HRD, because it's more
stable". Unless you have stated that you are deprecating the master/
slave system, that's not an answer to the stated problems with the
existing (and previously working and stable) system.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Deadline exceeded errors, huge latency, very high number of instances, entities not deleting

Reply via email to