Re: [google-appengine] Re: GAE starting unnecessary instances

Johan Euphrosine Mon, 25 Jul 2011 06:12:53 -0700

On Fri, Jul 22, 2011 at 8:57 PM, Galoch <galoch...@gmail.com> wrote:
> Hi Johan,
>
> Thanks for the explanation. I have couple of questions on that.


Thanks for showing interest in GAE internals, I'd be happy to answer
those questions directly if I can, or forward them to someone who can
answer them better.

> 1. "1 Hours ago while all your Always On instance were busy and you
> had a burst of incoming requests"
> While this may be true when my Always On instances were "busy" running
> some stuff but what about when 2 Always On instances show only "1"
> request served which is the Warmup request itself. Does this mean
> Warmup requests are considered as traffic? If that is the case then
> Always On instances seem rather useless since they will never ever get
> called in this scenario.

On the admin console capture you included in your previous mail, I
didn't see Always On instances showing only "1" request served but
rather:
Resident Instance 1:   Requests: 49     Age: 1Hr
Resident Instance 2:   Requests: 6      Age: 1Hr
Resident Instance 3:   Requests: 2      Age: 1Hr

Let me know if I missed something.

> 2. As Tom mentioned, what qualifies "busy". When threadsafe option was
> implemented in GAE these 3 Always On instances were able to do most of
> the heavy lifting with occasional spinning of dynamic instances.
> Nothing has changed on our side that should alter this behavior. With
> all these changes happening within GAE I am trying to figure out what
> changed and what we can do to contain this burst of traffic within 3
> (or more ) Always On instances with less frequent spinning of Dynamic
> instances.

There are two scheduler knobs that could help you to affect the way
Dynamic instance are spawned.
 "Minimum Pending Latency" and "Max Idle Instances" as described here:
http://code.google.com/appengine/docs/adminconsole/performancesettings.html

> 3. "- 2 Minutes ago all your instances Always On + Dynamic were busy
> again and the scheduler spawned a new Dynamic instance that handle 7
> incoming requests. "
> Again what constitutes "busy" as I do not see any request being served
> by Always On instances 2 and 3 in last 1 hour. Note that number of
> requests served by Always On 2/3 are unchanged since they were
> created ...
> Here's my reading in this scenario:
> a. It kills Dynamic Instance 1 within 2 minutes of serving a request
> b. When traffic comes in it looks only for Dynamic Instances if they
> are busy and completely ignores Always On instances at this point
> c. It recreates Dynamic Instance 1
>
> In other words, what rule is applied in this case?

Sorry, those were mostly specification of mine, I didn't know that the
request served by Always On 2/3 were unchanged according to the
information you provided.
I can investigate deeper into the specific behaviour of your
application, if you open a Production Issue with your application id.

> Also I fail to understand rule 4 as both Rob and Luca mentioned. That
> completely undermines having Always On instances under threadsafe
> mode.
>
> 4. I like Rob's suggestion of better load balancing techniques but
> again with a caveat that an instance needs to be able to serve
> multiple threads before reaching a set capacity (80% or so)
>
> 5. Luca's suggestion also makes sense but again with the same
> caveat ... it should be able to process multiple threads before
> queuing

Thanks a lot for your feedback, I will make sure to forward those
suggestions to the engineering team.

>
> 6. I looked at the new sliders in the Admin console and with those the
> situation is even worse. I set the Max Idle Instances to 3 (that's the
> minimum I could choose) and Min Pending Latency to 15 secs ... Guess
> what our CPU usage has gone up to 15 in 12 hrs because of constant
> creation and killing of 3 dynamic instances. Bare minimum traffic and
> few light weight crons.
> But the good side is now I see requests coming in on the 3 Always On
> instances. Is that enough load they are serving ... I don't know yet
> but something to observe.

Maybe you can open a feature request for having a smaller min for 'Max
Idle Instance' when Always On is activated or having Always On
instances count in Max Idle Instance.

> Two things I suggest would be really helpful for us:
> A. The overall key here is to know the thread handling capacity of an
> instance. Better yet if it can be configured similar to Backends but
> dynamic in nature (and of course Backends pricing is outrageous ...
> but that's another topic)

Are you looking for <max-concurrent-requests> support for Servlet ? If
so I would recommend to open a Feature request.

> B. Able to add more Always On instances but again with a dependency
> explained in point A.

Again, opening a feature request make sense to track this separately.

> On Jul 22, 7:57 am, Johan Euphrosine <pro...@google.com> wrote:
>> HI Galoch,
>>
>> Thanks for the followup,
>>
>> I think you are experiencing a combinaison fo the two following rules
>> I was pointing to in my previous email:
>> (> reads as has priority for handling the incoming request)
>> 2/ Spawning a new Dynamic instance > Busy Always On instance
>> 4/ Idle Dynamic instance > Idle Always On instance
>>
>> Applied to your example it could means that:
>> Resident Instance 1:   Requests: 49     Age: 1Hr
>> Resident Instance 2:   Requests: 6      Age: 1Hr
>> Resident Instance 3:   Requests: 2      Age: 1Hr
>> Dynamic Instance 1:   Requests: 7      Age: 2min
>> Dynamic Instance 2:   Requests: 291  Age: 1Hr
>> Dynamic Instance 3:   Requests: 322  Age: 1Hr
>>
>> - 1 Hours ago while all your Always On instance were busy and you had
>> a burst of incoming requests and the scheduler spawned new Dynamic
>> instances as per rule 2/ highlighted above.
>> - After the burst and back to normal traffic the new Dynamic Instances
>> were handing incoming requests in priority as per rule 4/ highlighted
>> above.
>> - 2 Minutes ago all your instances Always On + Dynamic were busy again
>> and the scheduler spawned a new Dynamic instance that handle 7
>> incoming requests.
>>
>> Hope that make more sense for you and Francois, but as I said earlier
>> we are open to suggestion and I will make sure someone working on the
>> scheduler team monitor this thread for your input.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 22, 2011 at 9:09 AM, Galoch <galoch...@gmail.com> wrote:
>> > @Johan,
>> > The issue is not about Always On instance being busy. Its actually the
>> > other way ... the Always On instance is never busy ... at least that
>> > is what we observed in last 3-4 days. Your explanation may be partly
>> > true since this behavior keeps on changing.
>>
>> > For e.g. I have a snapshot of instances from July 19th and here's the
>> > details (for some reason I can't see a link to attach the snapshot
>> > images here):
>> > Resident Instance 1:   Requests: 49     Age: 1Hr
>> > Resident Instance 2:   Requests: 6      Age: 1Hr
>> > Resident Instance 3:   Requests: 2      Age: 1Hr
>> > Dynamic Instance 1:   Requests: 7      Age: 2min
>> > Dynamic Instance 2:   Requests: 291  Age: 1Hr
>> > Dynamic Instance 3:   Requests: 322  Age: 1Hr
>>
>> > This is under "no load" with only very light weight cron jobs running.
>> > This gets much much worse during the day under peak load with requests
>> > for dynamic instances reaching 1000+ in matter of minutes and resident
>> > instances have only "1" request served.
>>
>> > As you see above Resident Instance 2 and 3 are hardly hit so I don't
>> > think they are busy at all. On the other hand, Dynamic Instance 2 and
>> > 3 get most of the hits.
>>
>> > Dynamic Instance 1 is what is killing us. It keeps getting killed and
>> > reborn within that 5 minute window!!
>>
>> > We use Spring framework and it is really very expensive for us when a
>> > new instance starts up.
>>
>> > Just to give you a background, we had gone through a real roller
>> > coaster ride to make this to work on GAE by breaking the loading of
>> > framework into many different chunks. But still spinning was out of
>> > control. Then we found java threads to our rescue. We worked through
>> > the hack to load JDO to avoid UnsupportedOperationException. We
>> > finally got it to work where most of our requests were served by
>> > Always On instances with occasional spinning of Dynamic instances. It
>> > was quite impressive.
>>
>> > Unfortunately, this was short lived when we hit this new behavior with
>> > GAE. The very last thing we want GAE to do is create a new instance
>> > every few minutes as it could easily reach 30 second deadline during
>> > the day and throw critical error.
>>
>> > I am not sure when the new billing will come into effect but we really
>> > need this thing fixed as it literally brings down our app to a
>> > grinding halt. So I am open to any suggestions you guys think can help
>> > us.
>>
>> > Another thought about new scheduler is to have a configurable
>> > schedule. For e.g. our users are mostly business users who work during
>> > normal business hours. We want to be able to spin more Always On
>> > instances during those hours and bring the number down during nights
>> > and weekends. Dynamic instances won't work for us due to reason
>> > explained above.
>>
>> > Thanks,
>> > galoch
>>
>> > On Jul 21, 5:56 pm, Johan Euphrosine <pro...@google.com> wrote:
>> >> After speaking with Engs, I think I can explain what is going on:
>>
>> >> Here are the current scheduling rules: (> reads as has priority for
>> >> handling the incoming request)
>>
>> >> 1/ Idle Always On instance > Spawning a new Dynamic instance
>> >> 2/ Spawning a new Dynamic instance > Busy Always On instance
>> >> 3/ Idle Dynamic instance > Busy Always On instance
>> >> 4/ Idle Dynamic instance > Idle Always On instance
>>
>> >> I will give you an example to illustrate the behavior you all noticed,
>> >> that is Dynamic instance handling request while Always On is idle.
>>
>> >> (Always On instance started)
>> >> - Incoming request
>> >> - Always On instance handle the request
>> >> - another Incoming request
>> >> (Always On instance busy)
>> >> - A new Dynamic instance is spawned
>> >> (Dynamic instance idle, Always on instance busy)
>> >> - Dynamic instance handle the request
>> >> - another Incoming request
>> >> (Dynamic instance idle, Always on instance idle)
>> >> - Dynamic instance handle the request
>> >> - No request for more than idle-dynamic-instance-timeout
>> >> - Dynamic instance shut down
>> >> - another Incoming request
>> >> (Always On instance idle)
>> >> - Always On instance handle the request
>>
>> >> Hope it makes thing clearer.
>>
>> >> As part of the new billing model you will have a scheduler knob called
>> >> 'max-idle-instances' that you can use if extra idling dynamic
>> >> instances are undesired.
>>
>> >> The good news is that we are open to suggestion, if you think this
>> >> behavior is the wrong default, feel free to comment on that thread and
>> >> I will follow up your suggestion to the Engineering team.
>>
>> >> On Wed, Jul 20, 2011 at 12:18 AM, Galoch <galoch...@gmail.com> wrote:
>> >> > Same here. Seems like GAE is totally ignoring Always On instances.
>> >> > I also noticed that even with no user hitting our app and a single
>> >> > cron job that runs every 5 minutes it is still spinning instances
>> >> > every 3 minutes and then killing them in 2 minutes.
>>
>> >> > This has been happening since after the upgrade on 14th July. During
>> >> > peak load this really gets nasty and brings down the performance.
>>
>> >> > This is the feedback I got yesterday from one of our customers since
>> >> > it takes time to spin an instance (and yes we use Spring):
>>
>> >> > "1) I found the GUI to be very laggy"
>>
>> >> > Can someone from Google please respond?
>>
>> >> > --
>> >> > You received this message because you are subscribed to the Google 
>> >> > Groups "Google App Engine" group.
>> >> > To post to this group, send email to google-appengine@googlegroups.com.
>> >> > To unsubscribe from this group, send email to 
>> >> > google-appengine+unsubscr...@googlegroups.com.
>> >> > For more options, visit this group 
>> >> > athttp://groups.google.com/group/google-appengine?hl=en.
>>
>> >> --
>> >> Johan Euphrosine (proppy)
>> >> Developer Programs Engineer
>> >> Google Developer Relations
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups 
>> > "Google App Engine" group.
>> > To post to this group, send email to google-appengine@googlegroups.com.
>> > To unsubscribe from this group, send email to 
>> > google-appengine+unsubscr...@googlegroups.com.
>> > For more options, visit this group 
>> > athttp://groups.google.com/group/google-appengine?hl=en.
>>
>> --
>> Johan Euphrosine (proppy)
>> Developer Programs Engineer
>> Google Developer Relations
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to 
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>
>



-- 
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: GAE starting unnecessary instances

Reply via email to