Re: [google-appengine] Re: GAE starting unnecessary instances

Rob Coops Fri, 22 Jul 2011 08:37:35 -0700

1/ Idle Always On instance > Spawning a new Dynamic instance
2/ Spawning a new Dynamic instance > Busy Always On instance
3/ Idle Dynamic instance > Busy Always On instance
4/ Idle Dynamic instance > Idle Always On instance


So App engine prefers to use bored Always On instances over spawning new
dynamic once that's good. If the Always On instances are busy it spawns a
new Dynamic instance good. If a Dynamic instance is bored but the Always On
once are busy the Dynamic instance gets the load still good.

But then you loose me if the Always one instance is idle the Dynamic
instance still gets the load, why?

In this last case I would expect the Always on instance to get the load
otherwise the Dynamic instance will keep on being busy and will not get
stopped because of it.

I don't know what the cost are of spawning and later destroying a Dynamic
instance but I cannot imagine that this is such a huge cost that you would
have to prefer using the Dynamic instances over the Always On once.

I believe that this last rule should read:
4/ Idle Always On instance > Idle Dynamic instance

In which case the Idle Always On instances would get the load and the bored
Dynamic instances would get cleaned up much faster then they are now.
I suspect this is a typo though as I cannot imagine that this is really the
setup but if it is I would say that is a candidate for change. :-)

The other thing I would suggest is altering the load balancing rules for
Dynamic instances from the picture painted in this email it looks like the
load balancing of multiple Dynamic instances is pretty much round robin (or
equal load based). If this would be changed to always try and load use one
Dynamic instance till the load reaches 80% or so before using the second one
and so on this would allow the despawning of excess Dynamic instances much
sooner then when one uses the current setup. This does mean a slightly
bigger hit in case of a serious failure of the currently preferred Dynamic
instance. Hence the 80% mark for the load which is arbitrarily chosen by
randomly picking a number above 50 and might need some more scientific work
to ensure that in an average scenario the remaining instances will usually
be able to take the load caused by the sudden death of an Dynamic instance.

>From what I currently see it looks like the safest option has been chosen,
meaning that in all cases the service will remain active no matter what
happens but this means a significant cost on the customer side. I suspect
that many customers are happy with that, but I think that an equal amount of
them will want to see a situation where the costs are less likely to spike
while providing a similar albeit slightly less high availability solution.
It might be an interesting idea to offer several flavors of high
availability, ranging from the current supper safe but relatively
unpredictable cost to a pretty decent with very predictable costs one. I
have no idea if this is technically possible but something tells me that it
should not be that hard to do. And even if it is a little harder to do my
guess is that Google would be up to that task.

Well that's my two cents...

Regards,

Rob



(> reads as has priority for handling the incoming request)
2/ Spawning a new Dynamic instance > Busy Always On instance
4/ Idle Dynamic instance > Idle Always On instance

On Fri, Jul 22, 2011 at 4:57 PM, Johan Euphrosine <pro...@google.com> wrote:

> HI Galoch,
>
> Thanks for the followup,
>
> I think you are experiencing a combinaison fo the two following rules
> I was pointing to in my previous email:
> (> reads as has priority for handling the incoming request)
> 2/ Spawning a new Dynamic instance > Busy Always On instance
> 4/ Idle Dynamic instance > Idle Always On instance
>
> Applied to your example it could means that:
> Resident Instance 1:   Requests: 49     Age: 1Hr
> Resident Instance 2:   Requests: 6      Age: 1Hr
> Resident Instance 3:   Requests: 2      Age: 1Hr
> Dynamic Instance 1:   Requests: 7      Age: 2min
> Dynamic Instance 2:   Requests: 291  Age: 1Hr
> Dynamic Instance 3:   Requests: 322  Age: 1Hr
>
> - 1 Hours ago while all your Always On instance were busy and you had
> a burst of incoming requests and the scheduler spawned new Dynamic
> instances as per rule 2/ highlighted above.
> - After the burst and back to normal traffic the new Dynamic Instances
> were handing incoming requests in priority as per rule 4/ highlighted
> above.
> - 2 Minutes ago all your instances Always On + Dynamic were busy again
> and the scheduler spawned a new Dynamic instance that handle 7
> incoming requests.
>
> Hope that make more sense for you and Francois, but as I said earlier
> we are open to suggestion and I will make sure someone working on the
> scheduler team monitor this thread for your input.
>
> On Fri, Jul 22, 2011 at 9:09 AM, Galoch <galoch...@gmail.com> wrote:
> > @Johan,
> > The issue is not about Always On instance being busy. Its actually the
> > other way ... the Always On instance is never busy ... at least that
> > is what we observed in last 3-4 days. Your explanation may be partly
> > true since this behavior keeps on changing.
> >
> > For e.g. I have a snapshot of instances from July 19th and here's the
> > details (for some reason I can't see a link to attach the snapshot
> > images here):
> > Resident Instance 1:   Requests: 49     Age: 1Hr
> > Resident Instance 2:   Requests: 6      Age: 1Hr
> > Resident Instance 3:   Requests: 2      Age: 1Hr
> > Dynamic Instance 1:   Requests: 7      Age: 2min
> > Dynamic Instance 2:   Requests: 291  Age: 1Hr
> > Dynamic Instance 3:   Requests: 322  Age: 1Hr
> >
> > This is under "no load" with only very light weight cron jobs running.
> > This gets much much worse during the day under peak load with requests
> > for dynamic instances reaching 1000+ in matter of minutes and resident
> > instances have only "1" request served.
> >
> > As you see above Resident Instance 2 and 3 are hardly hit so I don't
> > think they are busy at all. On the other hand, Dynamic Instance 2 and
> > 3 get most of the hits.
> >
> > Dynamic Instance 1 is what is killing us. It keeps getting killed and
> > reborn within that 5 minute window!!
> >
> > We use Spring framework and it is really very expensive for us when a
> > new instance starts up.
> >
> > Just to give you a background, we had gone through a real roller
> > coaster ride to make this to work on GAE by breaking the loading of
> > framework into many different chunks. But still spinning was out of
> > control. Then we found java threads to our rescue. We worked through
> > the hack to load JDO to avoid UnsupportedOperationException. We
> > finally got it to work where most of our requests were served by
> > Always On instances with occasional spinning of Dynamic instances. It
> > was quite impressive.
> >
> > Unfortunately, this was short lived when we hit this new behavior with
> > GAE. The very last thing we want GAE to do is create a new instance
> > every few minutes as it could easily reach 30 second deadline during
> > the day and throw critical error.
> >
> > I am not sure when the new billing will come into effect but we really
> > need this thing fixed as it literally brings down our app to a
> > grinding halt. So I am open to any suggestions you guys think can help
> > us.
> >
> > Another thought about new scheduler is to have a configurable
> > schedule. For e.g. our users are mostly business users who work during
> > normal business hours. We want to be able to spin more Always On
> > instances during those hours and bring the number down during nights
> > and weekends. Dynamic instances won't work for us due to reason
> > explained above.
> >
> >
> > Thanks,
> > galoch
> >
> >
> >
> >
> >
> >
> > On Jul 21, 5:56 pm, Johan Euphrosine <pro...@google.com> wrote:
> >> After speaking with Engs, I think I can explain what is going on:
> >>
> >> Here are the current scheduling rules: (> reads as has priority for
> >> handling the incoming request)
> >>
> >> 1/ Idle Always On instance > Spawning a new Dynamic instance
> >> 2/ Spawning a new Dynamic instance > Busy Always On instance
> >> 3/ Idle Dynamic instance > Busy Always On instance
> >> 4/ Idle Dynamic instance > Idle Always On instance
> >>
> >> I will give you an example to illustrate the behavior you all noticed,
> >> that is Dynamic instance handling request while Always On is idle.
> >>
> >> (Always On instance started)
> >> - Incoming request
> >> - Always On instance handle the request
> >> - another Incoming request
> >> (Always On instance busy)
> >> - A new Dynamic instance is spawned
> >> (Dynamic instance idle, Always on instance busy)
> >> - Dynamic instance handle the request
> >> - another Incoming request
> >> (Dynamic instance idle, Always on instance idle)
> >> - Dynamic instance handle the request
> >> - No request for more than idle-dynamic-instance-timeout
> >> - Dynamic instance shut down
> >> - another Incoming request
> >> (Always On instance idle)
> >> - Always On instance handle the request
> >>
> >> Hope it makes thing clearer.
> >>
> >> As part of the new billing model you will have a scheduler knob called
> >> 'max-idle-instances' that you can use if extra idling dynamic
> >> instances are undesired.
> >>
> >> The good news is that we are open to suggestion, if you think this
> >> behavior is the wrong default, feel free to comment on that thread and
> >> I will follow up your suggestion to the Engineering team.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jul 20, 2011 at 12:18 AM, Galoch <galoch...@gmail.com> wrote:
> >> > Same here. Seems like GAE is totally ignoring Always On instances.
> >> > I also noticed that even with no user hitting our app and a single
> >> > cron job that runs every 5 minutes it is still spinning instances
> >> > every 3 minutes and then killing them in 2 minutes.
> >>
> >> > This has been happening since after the upgrade on 14th July. During
> >> > peak load this really gets nasty and brings down the performance.
> >>
> >> > This is the feedback I got yesterday from one of our customers since
> >> > it takes time to spin an instance (and yes we use Spring):
> >>
> >> > "1) I found the GUI to be very laggy"
> >>
> >> > Can someone from Google please respond?
> >>
> >> > --
> >> > You received this message because you are subscribed to the Google
> Groups "Google App Engine" group.
> >> > To post to this group, send email to
> google-appengine@googlegroups.com.
> >> > To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> >> > For more options, visit this group athttp://
> groups.google.com/group/google-appengine?hl=en.
> >>
> >> --
> >> Johan Euphrosine (proppy)
> >> Developer Programs Engineer
> >> Google Developer Relations
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> > To post to this group, send email to google-appengine@googlegroups.com.
> > To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
> >
> >
>
>
>
> --
> Johan Euphrosine (proppy)
> Developer Programs Engineer
> Google Developer Relations
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to google-appengine@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengine+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: GAE starting unnecessary instances

Reply via email to