[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-14 Thread Dave Peck
Hi Ikai,

The app id is "citygoround".

We had a number of stretches of "badness" this morning. An example
stretch:

6:07AM 33.867 ("Request was aborted...")
6:07AM 49.672 through 7:12AM 24.470 ("DeadlineExceededError" and/or
"ImproperlyConfiguredError" -- looks like it depends on which imports
fail.)

And another:

8:17AM 37.620 ("Request was aborted...")
8:17AM 54.348 through 8:46AM 51.478 ("DeadlineExceededError" and/or
"ImproperlyConfiguredError")

One last thing: the app is open source. If it helps, you can find the
exact code that we're running in production at:

http://github.com/davepeck/CityGoRound/

The screenshot handler in question is found in ./citygoround/views/
app.py Line 115.

Cheers,
Dave


On Dec 14, 1:32 pm, "Ikai L (Google)"  wrote:
> Do you see that it's consistent at the same times? What's your application
> ID? I'll look into it.
>
>
>
>
>
> On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck  wrote:
> > Hello,
>
> > I have an app (citygoround.org) that, especially in the morning, often
> > has 10-15 minutes of outright downtime due to server errors.
>
> > Looking into it, I see that right before the downtime starts, a few
> > requests log the following warning message:
>
> >    > Request was aborted after waiting too long to attempt to service
> > your request.
> >    > Most likely, this indicates that you have reached your
> > simultaneous dynamic request limit.
>
> > I'm certainly not over my limit, but I can believe that the request in
> > question could take a while. (I'll get to the details of that request
> > in a moment.)
>
> > Immediately after these warnings, my app has a large amount of time
> > (10+ minutes) where *all requests* -- no matter how unthreatening --
> > raise a DeadlineExceededError. Usually this is raised during the
> > import of an innocuous module like "re" or "time" or perhaps a Django
> > 1.1 module. (We use use_library.)
>
> > My best theory at the moment is that:
>
> > 1. It's a cold start, so nothing is cached.
> > 2. App Engine encounters the high latency request and bails.
> > 3. We probably inadvertently catch the DeadlineExceededError, so the
> > runtime doesn't clean up properly.
> > 4. Future requests are left in a busted state.
>
> > Does this sound at all reasonable? I see a few related issues (2396,
> > 2266, and 1409) but no firm/completely clear discussion of what's
> > happening in any of them.
>
> > Thanks,
> > Dave
>
> > PS:
>
> > The specifics about our high latency request are *not* strictly
> > relevant to the larger problem I'm having, but I will include them
> > because I have a second "side" question to ask about it.
>
> > The "high latency" request is serving an image. Our app lets users
> > upload images and we store them in the data store. When serving an
> > image, our handler:
>
> > 1. Checks to see if the bytes for the image are in memcache, and if so
> > returns them immediately.
> > 2. Otherwise grabs the image from the datastore, and if it is smaller
> > than 64K, adds the bytes to the memcache
> > 3. Returns the result
>
> > I'm wondering if using memcache in this way is a smart idea -- it may
> > very well be the cause of our latency issues. It's hard to tell.
>
> > Alternatively, the issue could be: we have a page that shows a large
> > number (~100) of such images. If someone requests this page, we may
> > have a lot of simultaneous image-producing requests happening at the
> > same time. Perhaps _this_ is the root cause of the original "Request
> > was aborted" issue?
>
> > Just not sure here...
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com > e...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-15 Thread Jason C
Ikai,

We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am to
7.30am (log time).

Can you look into that as well?

Thanks,
j

On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
> Do you see that it's consistent at the same times? What's your application
> ID? I'll look into it.
>
>
>
>
>
> On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck  wrote:
> > Hello,
>
> > I have an app (citygoround.org) that, especially in the morning, often
> > has 10-15 minutes of outright downtime due to server errors.
>
> > Looking into it, I see that right before the downtime starts, a few
> > requests log the following warning message:
>
> >    > Request was aborted after waiting too long to attempt to service
> > your request.
> >    > Most likely, this indicates that you have reached your
> > simultaneous dynamic request limit.
>
> > I'm certainly not over my limit, but I can believe that the request in
> > question could take a while. (I'll get to the details of that request
> > in a moment.)
>
> > Immediately after these warnings, my app has a large amount of time
> > (10+ minutes) where *all requests* -- no matter how unthreatening --
> > raise a DeadlineExceededError. Usually this is raised during the
> > import of an innocuous module like "re" or "time" or perhaps a Django
> > 1.1 module. (We use use_library.)
>
> > My best theory at the moment is that:
>
> > 1. It's a cold start, so nothing is cached.
> > 2. App Engine encounters the high latency request and bails.
> > 3. We probably inadvertently catch the DeadlineExceededError, so the
> > runtime doesn't clean up properly.
> > 4. Future requests are left in a busted state.
>
> > Does this sound at all reasonable? I see a few related issues (2396,
> > 2266, and 1409) but no firm/completely clear discussion of what's
> > happening in any of them.
>
> > Thanks,
> > Dave
>
> > PS:
>
> > The specifics about our high latency request are *not* strictly
> > relevant to the larger problem I'm having, but I will include them
> > because I have a second "side" question to ask about it.
>
> > The "high latency" request is serving an image. Our app lets users
> > upload images and we store them in the data store. When serving an
> > image, our handler:
>
> > 1. Checks to see if the bytes for the image are in memcache, and if so
> > returns them immediately.
> > 2. Otherwise grabs the image from the datastore, and if it is smaller
> > than 64K, adds the bytes to the memcache
> > 3. Returns the result
>
> > I'm wondering if using memcache in this way is a smart idea -- it may
> > very well be the cause of our latency issues. It's hard to tell.
>
> > Alternatively, the issue could be: we have a page that shows a large
> > number (~100) of such images. If someone requests this page, we may
> > have a lot of simultaneous image-producing requests happening at the
> > same time. Perhaps _this_ is the root cause of the original "Request
> > was aborted" issue?
>
> > Just not sure here...
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-appeng...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengine+unsubscr...@googlegroups.com > e...@googlegroups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/google-appengine?hl=en.
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-15 Thread Dave Peck
Hi Ikai,

Any further details on your end? I get the feeling we're not the only
ones, and we've experienced very serious downtime in the last ~48
hours.

This is a critical issue for us to resolve, but at the same time we
lack key pieces of data that would help us solve it on our own...

Thanks,
Dave

On Dec 15, 9:14 am, Jason C  wrote:
> Ikai,
>
> We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am to
> 7.30am (log time).
>
> Can you look into that as well?
>
> Thanks,
> j
>
> On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
>
>
>
> > Do you see that it's consistent at the same times? What's your application
> > ID? I'll look into it.
>
> > On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck  wrote:
> > > Hello,
>
> > > I have an app (citygoround.org) that, especially in the morning, often
> > > has 10-15 minutes of outright downtime due to server errors.
>
> > > Looking into it, I see that right before the downtime starts, a few
> > > requests log the following warning message:
>
> > >    > Request was aborted after waiting too long to attempt to service
> > > your request.
> > >    > Most likely, this indicates that you have reached your
> > > simultaneous dynamic request limit.
>
> > > I'm certainly not over my limit, but I can believe that the request in
> > > question could take a while. (I'll get to the details of that request
> > > in a moment.)
>
> > > Immediately after these warnings, my app has a large amount of time
> > > (10+ minutes) where *all requests* -- no matter how unthreatening --
> > > raise a DeadlineExceededError. Usually this is raised during the
> > > import of an innocuous module like "re" or "time" or perhaps a Django
> > > 1.1 module. (We use use_library.)
>
> > > My best theory at the moment is that:
>
> > > 1. It's a cold start, so nothing is cached.
> > > 2. App Engine encounters the high latency request and bails.
> > > 3. We probably inadvertently catch the DeadlineExceededError, so the
> > > runtime doesn't clean up properly.
> > > 4. Future requests are left in a busted state.
>
> > > Does this sound at all reasonable? I see a few related issues (2396,
> > > 2266, and 1409) but no firm/completely clear discussion of what's
> > > happening in any of them.
>
> > > Thanks,
> > > Dave
>
> > > PS:
>
> > > The specifics about our high latency request are *not* strictly
> > > relevant to the larger problem I'm having, but I will include them
> > > because I have a second "side" question to ask about it.
>
> > > The "high latency" request is serving an image. Our app lets users
> > > upload images and we store them in the data store. When serving an
> > > image, our handler:
>
> > > 1. Checks to see if the bytes for the image are in memcache, and if so
> > > returns them immediately.
> > > 2. Otherwise grabs the image from the datastore, and if it is smaller
> > > than 64K, adds the bytes to the memcache
> > > 3. Returns the result
>
> > > I'm wondering if using memcache in this way is a smart idea -- it may
> > > very well be the cause of our latency issues. It's hard to tell.
>
> > > Alternatively, the issue could be: we have a page that shows a large
> > > number (~100) of such images. If someone requests this page, we may
> > > have a lot of simultaneous image-producing requests happening at the
> > > same time. Perhaps _this_ is the root cause of the original "Request
> > > was aborted" issue?
>
> > > Just not sure here...
>
> > > --
>
> > > You received this message because you are subscribed to the Google Groups
> > > "Google App Engine" group.
> > > To post to this group, send email to google-appeng...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > google-appengine+unsubscr...@googlegroups.com > >  e...@googlegroups.com>
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > Ikai Lan
> > Developer Programs Engineer, Google App Engine

--

You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-15 Thread Dave Peck
Ikai,

We'll keep an eye on our app for the next ~24 hours and report back.

At what time did you make the changes to our instance? We had
substantial downtime earlier today, alas.

Can you provide any details about what sort of change was made?

Thanks,
Dave

On Dec 15, 11:26 am, "Ikai L (Google)"  wrote:
> Dave,
>
> You're correct that this is likely affecting other applications, but it's
> not a global issue. There are hotspots in the cloud that we notice are being
> especially impacted during certain times of the day. We're actively working
> on addressing these issues, but in the meantime, there are manual steps we
> can try to prevent your applications from becoming resource starved. We do
> these on a one-off basis and reserve them only for applications that seem to
> exhibit the behavior of seeing DeadlineExceeded on simple actions (not
> initial JVM startup), and at fairly predictable intervals during the day.
> I've taken these steps to try to remedy your application. Can you let us
> know if these seem to help? If not, they may indicate that something is
> going on with your application code, though that does not seem like the case
> here.
>
>
>
>
>
> On Tue, Dec 15, 2009 at 10:54 AM, Dave Peck  wrote:
> > Hi Ikai,
>
> > Any further details on your end? I get the feeling we're not the only
> > ones, and we've experienced very serious downtime in the last ~48
> > hours.
>
> > This is a critical issue for us to resolve, but at the same time we
> > lack key pieces of data that would help us solve it on our own...
>
> > Thanks,
> > Dave
>
> > On Dec 15, 9:14 am, Jason C  wrote:
> > > Ikai,
>
> > > We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am to
> > > 7.30am (log time).
>
> > > Can you look into that as well?
>
> > > Thanks,
> > > j
>
> > > On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
>
> > > > Do you see that it's consistent at the same times? What's your
> > application
> > > > ID? I'll look into it.
>
> > > > On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck 
> > wrote:
> > > > > Hello,
>
> > > > > I have an app (citygoround.org) that, especially in the morning,
> > often
> > > > > has 10-15 minutes of outright downtime due to server errors.
>
> > > > > Looking into it, I see that right before the downtime starts, a few
> > > > > requests log the following warning message:
>
> > > > >    > Request was aborted after waiting too long to attempt to service
> > > > > your request.
> > > > >    > Most likely, this indicates that you have reached your
> > > > > simultaneous dynamic request limit.
>
> > > > > I'm certainly not over my limit, but I can believe that the request
> > in
> > > > > question could take a while. (I'll get to the details of that request
> > > > > in a moment.)
>
> > > > > Immediately after these warnings, my app has a large amount of time
> > > > > (10+ minutes) where *all requests* -- no matter how unthreatening --
> > > > > raise a DeadlineExceededError. Usually this is raised during the
> > > > > import of an innocuous module like "re" or "time" or perhaps a Django
> > > > > 1.1 module. (We use use_library.)
>
> > > > > My best theory at the moment is that:
>
> > > > > 1. It's a cold start, so nothing is cached.
> > > > > 2. App Engine encounters the high latency request and bails.
> > > > > 3. We probably inadvertently catch the DeadlineExceededError, so the
> > > > > runtime doesn't clean up properly.
> > > > > 4. Future requests are left in a busted state.
>
> > > > > Does this sound at all reasonable? I see a few related issues (2396,
> > > > > 2266, and 1409) but no firm/completely clear discussion of what's
> > > > > happening in any of them.
>
> > > > > Thanks,
> > > > > Dave
>
> > > > > PS:
>
> > > > > The specifics about our high latency request are *not* strictly
> > > > > relevant to the larger problem I'm having, but I will include them
> > > > > because I have a second "side" question to ask about it.
>
> > > > > The "high latency" request is serving an image. Our app lets users
> > > > > upload images and we store them in the data store. When serving an
> > > > > image, our handler:
>
> > > > > 1. Checks to see if the bytes for the image are in memcache, and if
> > so
> > > > > returns them immediately.
> > > > > 2. Otherwise grabs the image from the datastore, and if it is smaller
> > > > > than 64K, adds the bytes to the memcache
> > > > > 3. Returns the result
>
> > > > > I'm wondering if using memcache in this way is a smart idea -- it may
> > > > > very well be the cause of our latency issues. It's hard to tell.
>
> > > > > Alternatively, the issue could be: we have a page that shows a large
> > > > > number (~100) of such images. If someone requests this page, we may
> > > > > have a lot of simultaneous image-producing requests happening at the
> > > > > same time. Perhaps _this_ is the root cause of the original "Request
> > > > > was aborted" issue?
>
> > > > > Just not sure here...
>
> > > > > --
>
> > > > > You received this message be

[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-16 Thread Jason C
We (steprep) still saw a set of them on Dec 16 starting 3.54am through
6.57am (log time).

j

On Dec 15, 1:56 pm, "Ikai L (Google)"  wrote:
> I made the change right before I sent the email. Let me know how it works
> for you.
>
> Jason, I also made the change to your application. Please report back after
> tomorrow if you continue to experience issues.
>
>
>
>
>
> On Tue, Dec 15, 2009 at 11:39 AM, Dave Peck  wrote:
> > Ikai,
>
> > We'll keep an eye on our app for the next ~24 hours and report back.
>
> > At what time did you make the changes to our instance? We had
> > substantial downtime earlier today, alas.
>
> > Can you provide any details about what sort of change was made?
>
> > Thanks,
> > Dave
>
> > On Dec 15, 11:26 am, "Ikai L (Google)"  wrote:
> > > Dave,
>
> > > You're correct that this is likely affecting other applications, but it's
> > > not a global issue. There are hotspots in the cloud that we notice are
> > being
> > > especially impacted during certain times of the day. We're actively
> > working
> > > on addressing these issues, but in the meantime, there are manual steps
> > we
> > > can try to prevent your applications from becoming resource starved. We
> > do
> > > these on a one-off basis and reserve them only for applications that seem
> > to
> > > exhibit the behavior of seeing DeadlineExceeded on simple actions (not
> > > initial JVM startup), and at fairly predictable intervals during the day.
> > > I've taken these steps to try to remedy your application. Can you let us
> > > know if these seem to help? If not, they may indicate that something is
> > > going on with your application code, though that does not seem like the
> > case
> > > here.
>
> > > On Tue, Dec 15, 2009 at 10:54 AM, Dave Peck  wrote:
> > > > Hi Ikai,
>
> > > > Any further details on your end? I get the feeling we're not the only
> > > > ones, and we've experienced very serious downtime in the last ~48
> > > > hours.
>
> > > > This is a critical issue for us to resolve, but at the same time we
> > > > lack key pieces of data that would help us solve it on our own...
>
> > > > Thanks,
> > > > Dave
>
> > > > On Dec 15, 9:14 am, Jason C  wrote:
> > > > > Ikai,
>
> > > > > We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am
> > to
> > > > > 7.30am (log time).
>
> > > > > Can you look into that as well?
>
> > > > > Thanks,
> > > > > j
>
> > > > > On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
>
> > > > > > Do you see that it's consistent at the same times? What's your
> > > > application
> > > > > > ID? I'll look into it.
>
> > > > > > On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck 
> > > > wrote:
> > > > > > > Hello,
>
> > > > > > > I have an app (citygoround.org) that, especially in the morning,
> > > > often
> > > > > > > has 10-15 minutes of outright downtime due to server errors.
>
> > > > > > > Looking into it, I see that right before the downtime starts, a
> > few
> > > > > > > requests log the following warning message:
>
> > > > > > >    > Request was aborted after waiting too long to attempt to
> > service
> > > > > > > your request.
> > > > > > >    > Most likely, this indicates that you have reached your
> > > > > > > simultaneous dynamic request limit.
>
> > > > > > > I'm certainly not over my limit, but I can believe that the
> > request
> > > > in
> > > > > > > question could take a while. (I'll get to the details of that
> > request
> > > > > > > in a moment.)
>
> > > > > > > Immediately after these warnings, my app has a large amount of
> > time
> > > > > > > (10+ minutes) where *all requests* -- no matter how unthreatening
> > --
> > > > > > > raise a DeadlineExceededError. Usually this is raised during the
> > > > > > > import of an innocuous module like "re" or "time" or perhaps a
> > Django
> > > > > > > 1.1 module. (We use use_library.)
>
> > > > > > > My best theory at the moment is that:
>
> > > > > > > 1. It's a cold start, so nothing is cached.
> > > > > > > 2. App Engine encounters the high latency request and bails.
> > > > > > > 3. We probably inadvertently catch the DeadlineExceededError, so
> > the
> > > > > > > runtime doesn't clean up properly.
> > > > > > > 4. Future requests are left in a busted state.
>
> > > > > > > Does this sound at all reasonable? I see a few related issues
> > (2396,
> > > > > > > 2266, and 1409) but no firm/completely clear discussion of what's
> > > > > > > happening in any of them.
>
> > > > > > > Thanks,
> > > > > > > Dave
>
> > > > > > > PS:
>
> > > > > > > The specifics about our high latency request are *not* strictly
> > > > > > > relevant to the larger problem I'm having, but I will include
> > them
> > > > > > > because I have a second "side" question to ask about it.
>
> > > > > > > The "high latency" request is serving an image. Our app lets
> > users
> > > > > > > upload images and we store them in the data store. When serving
> > an
> > > > > > > image, our handler:
>
> > > > > > > 1. Checks to see if the bytes for the image are in

[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2010-01-19 Thread Wesley Chun (Google)
dave, jason,

just wanted to do a follow-up to see where things stand with your apps
now. i'm coming across a similar user issue and was wondering whether
it's the same problem or not. can you post your complete error stack
traces if you're still running into this issue? here's the issue filed
by the other user FYI, who's app seems to have few requests but each
one has high latency:

http://code.google.com/p/googleappengine/issues/detail?id=2621

if your respective apps don't suffer from this problem any more, what
did you do to resolve it or did it magically go away?

thanks,
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
   http://corepython.com

wesley.j.chun :: wesc+...@google.com
developer relations :: google app engine

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




[google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2010-01-20 Thread Jason C
I was under the impression that something happened internally at
Google to adjust the way that apps were balanced around machines and/
or other internal tuning.

Additionally, we run a ping every 10 seconds to keep an instance hot.
While I understand how this doesn't have much effect in a distributed
environment (though practically speaking in this case it does seem to
have a positive effect), and while I also understand how this is
"abuses" a shared resource, I'm currently afraid to turn it off.

j

On Jan 19, 8:10 pm, "Wesley Chun (Google)" 
wrote:
> dave, jason,
>
> just wanted to do a follow-up to see where things stand with your apps
> now. i'm coming across a similar user issue and was wondering whether
> it's the same problem or not. can you post your complete error stack
> traces if you're still running into this issue? here's the issue filed
> by the other user FYI, who's app seems to have few requests but each
> one has high latency:
>
> http://code.google.com/p/googleappengine/issues/detail?id=2621
>
> if your respective apps don't suffer from this problem any more, what
> did you do to resolve it or did it magically go away?
>
> thanks,
> -- wesley
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> "Core Python Programming", Prentice Hall, (c)2007,2001
> "Python Fundamentals", Prentice Hall, (c)2009
>    http://corepython.com
>
> wesley.j.chun :: wesc+...@google.com
> developer relations :: google app engine
-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appeng...@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.




Re: [google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-15 Thread Ikai L (Google)
Dave,

You're correct that this is likely affecting other applications, but it's
not a global issue. There are hotspots in the cloud that we notice are being
especially impacted during certain times of the day. We're actively working
on addressing these issues, but in the meantime, there are manual steps we
can try to prevent your applications from becoming resource starved. We do
these on a one-off basis and reserve them only for applications that seem to
exhibit the behavior of seeing DeadlineExceeded on simple actions (not
initial JVM startup), and at fairly predictable intervals during the day.
I've taken these steps to try to remedy your application. Can you let us
know if these seem to help? If not, they may indicate that something is
going on with your application code, though that does not seem like the case
here.


On Tue, Dec 15, 2009 at 10:54 AM, Dave Peck  wrote:

> Hi Ikai,
>
> Any further details on your end? I get the feeling we're not the only
> ones, and we've experienced very serious downtime in the last ~48
> hours.
>
> This is a critical issue for us to resolve, but at the same time we
> lack key pieces of data that would help us solve it on our own...
>
> Thanks,
> Dave
>
> On Dec 15, 9:14 am, Jason C  wrote:
> > Ikai,
> >
> > We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am to
> > 7.30am (log time).
> >
> > Can you look into that as well?
> >
> > Thanks,
> > j
> >
> > On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
> >
> >
> >
> > > Do you see that it's consistent at the same times? What's your
> application
> > > ID? I'll look into it.
> >
> > > On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck 
> wrote:
> > > > Hello,
> >
> > > > I have an app (citygoround.org) that, especially in the morning,
> often
> > > > has 10-15 minutes of outright downtime due to server errors.
> >
> > > > Looking into it, I see that right before the downtime starts, a few
> > > > requests log the following warning message:
> >
> > > >> Request was aborted after waiting too long to attempt to service
> > > > your request.
> > > >> Most likely, this indicates that you have reached your
> > > > simultaneous dynamic request limit.
> >
> > > > I'm certainly not over my limit, but I can believe that the request
> in
> > > > question could take a while. (I'll get to the details of that request
> > > > in a moment.)
> >
> > > > Immediately after these warnings, my app has a large amount of time
> > > > (10+ minutes) where *all requests* -- no matter how unthreatening --
> > > > raise a DeadlineExceededError. Usually this is raised during the
> > > > import of an innocuous module like "re" or "time" or perhaps a Django
> > > > 1.1 module. (We use use_library.)
> >
> > > > My best theory at the moment is that:
> >
> > > > 1. It's a cold start, so nothing is cached.
> > > > 2. App Engine encounters the high latency request and bails.
> > > > 3. We probably inadvertently catch the DeadlineExceededError, so the
> > > > runtime doesn't clean up properly.
> > > > 4. Future requests are left in a busted state.
> >
> > > > Does this sound at all reasonable? I see a few related issues (2396,
> > > > 2266, and 1409) but no firm/completely clear discussion of what's
> > > > happening in any of them.
> >
> > > > Thanks,
> > > > Dave
> >
> > > > PS:
> >
> > > > The specifics about our high latency request are *not* strictly
> > > > relevant to the larger problem I'm having, but I will include them
> > > > because I have a second "side" question to ask about it.
> >
> > > > The "high latency" request is serving an image. Our app lets users
> > > > upload images and we store them in the data store. When serving an
> > > > image, our handler:
> >
> > > > 1. Checks to see if the bytes for the image are in memcache, and if
> so
> > > > returns them immediately.
> > > > 2. Otherwise grabs the image from the datastore, and if it is smaller
> > > > than 64K, adds the bytes to the memcache
> > > > 3. Returns the result
> >
> > > > I'm wondering if using memcache in this way is a smart idea -- it may
> > > > very well be the cause of our latency issues. It's hard to tell.
> >
> > > > Alternatively, the issue could be: we have a page that shows a large
> > > > number (~100) of such images. If someone requests this page, we may
> > > > have a lot of simultaneous image-producing requests happening at the
> > > > same time. Perhaps _this_ is the root cause of the original "Request
> > > > was aborted" issue?
> >
> > > > Just not sure here...
> >
> > > > --
> >
> > > > You received this message because you are subscribed to the Google
> Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to
> google-appeng...@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > > > google-appengine+unsubscr...@googlegroups.com e...@googlegroups.com>
> > > > .
> > > > For more options, visit this group at
> > > >http://groups.google.com/group/google-appengine?hl=en.
> >
> > > --
> > > Ika

Re: [google-appengine] Re: "Request was aborted after waiting too long" followed by random DeadlineExceededError on import.

2009-12-15 Thread Ikai L (Google)
I made the change right before I sent the email. Let me know how it works
for you.

Jason, I also made the change to your application. Please report back after
tomorrow if you continue to experience issues.

On Tue, Dec 15, 2009 at 11:39 AM, Dave Peck  wrote:

> Ikai,
>
> We'll keep an eye on our app for the next ~24 hours and report back.
>
> At what time did you make the changes to our instance? We had
> substantial downtime earlier today, alas.
>
> Can you provide any details about what sort of change was made?
>
> Thanks,
> Dave
>
> On Dec 15, 11:26 am, "Ikai L (Google)"  wrote:
> > Dave,
> >
> > You're correct that this is likely affecting other applications, but it's
> > not a global issue. There are hotspots in the cloud that we notice are
> being
> > especially impacted during certain times of the day. We're actively
> working
> > on addressing these issues, but in the meantime, there are manual steps
> we
> > can try to prevent your applications from becoming resource starved. We
> do
> > these on a one-off basis and reserve them only for applications that seem
> to
> > exhibit the behavior of seeing DeadlineExceeded on simple actions (not
> > initial JVM startup), and at fairly predictable intervals during the day.
> > I've taken these steps to try to remedy your application. Can you let us
> > know if these seem to help? If not, they may indicate that something is
> > going on with your application code, though that does not seem like the
> case
> > here.
> >
> >
> >
> >
> >
> > On Tue, Dec 15, 2009 at 10:54 AM, Dave Peck  wrote:
> > > Hi Ikai,
> >
> > > Any further details on your end? I get the feeling we're not the only
> > > ones, and we've experienced very serious downtime in the last ~48
> > > hours.
> >
> > > This is a critical issue for us to resolve, but at the same time we
> > > lack key pieces of data that would help us solve it on our own...
> >
> > > Thanks,
> > > Dave
> >
> > > On Dec 15, 9:14 am, Jason C  wrote:
> > > > Ikai,
> >
> > > > We see daily DeadlineExceededErrors on app id 'steprep' from 6.30am
> to
> > > > 7.30am (log time).
> >
> > > > Can you look into that as well?
> >
> > > > Thanks,
> > > > j
> >
> > > > On Dec 14, 3:32 pm, "Ikai L (Google)"  wrote:
> >
> > > > > Do you see that it's consistent at the same times? What's your
> > > application
> > > > > ID? I'll look into it.
> >
> > > > > On Mon, Dec 14, 2009 at 11:28 AM, Dave Peck 
> > > wrote:
> > > > > > Hello,
> >
> > > > > > I have an app (citygoround.org) that, especially in the morning,
> > > often
> > > > > > has 10-15 minutes of outright downtime due to server errors.
> >
> > > > > > Looking into it, I see that right before the downtime starts, a
> few
> > > > > > requests log the following warning message:
> >
> > > > > >> Request was aborted after waiting too long to attempt to
> service
> > > > > > your request.
> > > > > >> Most likely, this indicates that you have reached your
> > > > > > simultaneous dynamic request limit.
> >
> > > > > > I'm certainly not over my limit, but I can believe that the
> request
> > > in
> > > > > > question could take a while. (I'll get to the details of that
> request
> > > > > > in a moment.)
> >
> > > > > > Immediately after these warnings, my app has a large amount of
> time
> > > > > > (10+ minutes) where *all requests* -- no matter how unthreatening
> --
> > > > > > raise a DeadlineExceededError. Usually this is raised during the
> > > > > > import of an innocuous module like "re" or "time" or perhaps a
> Django
> > > > > > 1.1 module. (We use use_library.)
> >
> > > > > > My best theory at the moment is that:
> >
> > > > > > 1. It's a cold start, so nothing is cached.
> > > > > > 2. App Engine encounters the high latency request and bails.
> > > > > > 3. We probably inadvertently catch the DeadlineExceededError, so
> the
> > > > > > runtime doesn't clean up properly.
> > > > > > 4. Future requests are left in a busted state.
> >
> > > > > > Does this sound at all reasonable? I see a few related issues
> (2396,
> > > > > > 2266, and 1409) but no firm/completely clear discussion of what's
> > > > > > happening in any of them.
> >
> > > > > > Thanks,
> > > > > > Dave
> >
> > > > > > PS:
> >
> > > > > > The specifics about our high latency request are *not* strictly
> > > > > > relevant to the larger problem I'm having, but I will include
> them
> > > > > > because I have a second "side" question to ask about it.
> >
> > > > > > The "high latency" request is serving an image. Our app lets
> users
> > > > > > upload images and we store them in the data store. When serving
> an
> > > > > > image, our handler:
> >
> > > > > > 1. Checks to see if the bytes for the image are in memcache, and
> if
> > > so
> > > > > > returns them immediately.
> > > > > > 2. Otherwise grabs the image from the datastore, and if it is
> smaller
> > > > > > than 64K, adds the bytes to the memcache
> > > > > > 3. Returns the result
> >
> > > > > > I'm wondering if using memcache in this