Re: [google-appengine] Re: Why resident instances in auto scaling are idle?

Vidya Fri, 28 Oct 2016 03:52:40 -0700

I think we were talking about orthogonal points to an extent here. There
are two separate components:

1. Why does it take so long to startup an instance?

The response here is that we need to work on our app startup times and
that's fair. I am positive we have room for optimization there, although
there will come a point where the tradeoff is really about using external
libraries vs getting down to the low level APIs - and we need to make that
an acceptable 2016 coding solution without needing to go back to circa 2000
:)

2. While the new instance is being brought up, why aren't requests being
served by the resident instance and why do we always see the resident
instance to be idle?

This is a question I haven't seen an answer to yet. Really, this is the
bigger question, since even if we brought down our app startup times to
say, 5 seconds, that's still unacceptable user latencies (and an average of
5s will imply enough variances that reach 2-3x that).

Now, after this discussion, I dug up a bunch of things and have learned a
thing or two about how the GAE schedulers *may* be working. A particularly
interesting thread I found was this -
https://groups.google.com/forum/#!topic/google-appengine/sA3o-PTAckc%5B26-50%5D
.

Based on my understanding so far, it looks like new requests will be routed
to the new instance as soon as it is "instantiated", without necessarily
waiting for it to actually be up (or they may be routed based on some
thresholds of the request rates). Obviously, this will put the instance
startup latencies right in the path of the user response times.

If this is indeed true, this is going to be a terrible experience for
anyone that is not operating at, well, essentially Google scale :). I can
imagine that this works very well at massive scale (and that may be the
great benefit of using GAE) - but, an app wouldn't live to see that day if,
in its initial days, it is providing a sucky UX due to long response times.

Our own experience matches this explanation - with min pending latency
values of 30ms (we left it at default), a new instance is getting spun up
the minute there are any requests in the queue whatsoever - which means the
new requests are routed immediately to the new instance, while the resident
instance remains idle practically all the time.

I would imagine that the scheduler would adapt to particular app startup
times + total number of operating instances to determine when to move new
requests over to a new instance.

Given little to no official information about how the scheduler works, I'd
like to understand if the experimentation and observations that the GAE
community has developed is in fact correct - otherwise, we're going to be
launching into a mini manual Monte Carlo simulation to really tune the
knobs that may work for our case. The risk here, of course, is that as soon
as the operating parameters change for our app, we need to be re-running
the simulation.

I have to say that the GAE docs leave a lot to be desired. For the number
of people that have suffered through this topic, one would hope by now that
there are more insights on what's actually going on and how to maximize the
efficiency of app engine.

Unfortunately, "optimize your app's startup times" is a necessary but
insufficient answer. As I wrote, unless we're talking 100ms app startup
times, in today's expected UX metrics, we aren't going very far with this.

All that said, we're currently starting to experiment with longer min
pending time values to see if that causes the resident instances to kick
in. It looks like some people have had luck with that. This will obviously
cause problems at scale, but, if it solves our problem at the small scales
now, we'd start there and change it as we scale up.

One more hack to our list there - but, if you have better suggestions, I'd
like to know.

Thanks,
Vidya

On Thu, Oct 27, 2016 at 6:59 PM, Nick <naoku...@gmail.com> wrote:

> A few years ago I came across a suggestion that you can improve startup
> time by including you war class files in a jar, implying that maybe the war
> is loaded in an exploded manner. Not sure if this is true or relevant, I've
> never bothered. You could test it out.
>
> I've never seen resident instance work or do anything, I always observe
> cold starts on requests I make even when a resident instance is running.
> Quite often you'll also see the full load being born by one or two
> instances while others lie around for a long time only serving one or two
> requests. It would be great if someone had time to play and understand
> practically what matters.
>
> I found it interesting that resident instances are supposed to be
> converted to dynamic though - I never noticed that.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/google-appengine/0ZBLsyc51gk/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at https://groups.google.com/group/google-appengine.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/google-appengine/25dbc043-4fc8-4286-92f7-
> 9485cd7b908a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CADf2OP7H53Dwhd2sHdQf-sKYJxh_%3DZ8qOwwd5uwj6xR0PbfdWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [google-appengine] Re: Why resident instances in auto scaling are idle?

Reply via email to