I think we were talking about orthogonal points to an extent here. There are two separate components:
1. Why does it take so long to startup an instance? The response here is that we need to work on our app startup times and that's fair. I am positive we have room for optimization there, although there will come a point where the tradeoff is really about using external libraries vs getting down to the low level APIs - and we need to make that an acceptable 2016 coding solution without needing to go back to circa 2000 :) 2. While the new instance is being brought up, why aren't requests being served by the resident instance and why do we always see the resident instance to be idle? This is a question I haven't seen an answer to yet. Really, this is the bigger question, since even if we brought down our app startup times to say, 5 seconds, that's still unacceptable user latencies (and an average of 5s will imply enough variances that reach 2-3x that). Now, after this discussion, I dug up a bunch of things and have learned a thing or two about how the GAE schedulers *may* be working. A particularly interesting thread I found was this - https://groups.google.com/forum/#!topic/google-appengine/sA3o-PTAckc%5B26-50%5D . Based on my understanding so far, it looks like new requests will be routed to the new instance as soon as it is "instantiated", without necessarily waiting for it to actually be up (or they may be routed based on some thresholds of the request rates). Obviously, this will put the instance startup latencies right in the path of the user response times. If this is indeed true, this is going to be a terrible experience for anyone that is not operating at, well, essentially Google scale :). I can imagine that this works very well at massive scale (and that may be the great benefit of using GAE) - but, an app wouldn't live to see that day if, in its initial days, it is providing a sucky UX due to long response times. Our own experience matches this explanation - with min pending latency values of 30ms (we left it at default), a new instance is getting spun up the minute there are any requests in the queue whatsoever - which means the new requests are routed immediately to the new instance, while the resident instance remains idle practically all the time. I would imagine that the scheduler would adapt to particular app startup times + total number of operating instances to determine when to move new requests over to a new instance. Given little to no official information about how the scheduler works, I'd like to understand if the experimentation and observations that the GAE community has developed is in fact correct - otherwise, we're going to be launching into a mini manual Monte Carlo simulation to really tune the knobs that may work for our case. The risk here, of course, is that as soon as the operating parameters change for our app, we need to be re-running the simulation. I have to say that the GAE docs leave a lot to be desired. For the number of people that have suffered through this topic, one would hope by now that there are more insights on what's actually going on and how to maximize the efficiency of app engine. Unfortunately, "optimize your app's startup times" is a necessary but insufficient answer. As I wrote, unless we're talking 100ms app startup times, in today's expected UX metrics, we aren't going very far with this. All that said, we're currently starting to experiment with longer min pending time values to see if that causes the resident instances to kick in. It looks like some people have had luck with that. This will obviously cause problems at scale, but, if it solves our problem at the small scales now, we'd start there and change it as we scale up. One more hack to our list there - but, if you have better suggestions, I'd like to know. Thanks, Vidya On Thu, Oct 27, 2016 at 6:59 PM, Nick <naoku...@gmail.com> wrote: > A few years ago I came across a suggestion that you can improve startup > time by including you war class files in a jar, implying that maybe the war > is loaded in an exploded manner. Not sure if this is true or relevant, I've > never bothered. You could test it out. > > I've never seen resident instance work or do anything, I always observe > cold starts on requests I make even when a resident instance is running. > Quite often you'll also see the full load being born by one or two > instances while others lie around for a long time only serving one or two > requests. It would be great if someone had time to play and understand > practically what matters. > > I found it interesting that resident instances are supposed to be > converted to dynamic though - I never noticed that. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "Google App Engine" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/google-appengine/0ZBLsyc51gk/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > google-appengine+unsubscr...@googlegroups.com. > To post to this group, send email to google-appengine@googlegroups.com. > Visit this group at https://groups.google.com/group/google-appengine. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/google-appengine/25dbc043-4fc8-4286-92f7- > 9485cd7b908a%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/CADf2OP7H53Dwhd2sHdQf-sKYJxh_%3DZ8qOwwd5uwj6xR0PbfdWA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.