On Jul 18, 2:38 pm, Michael Hermus <michael.her...@gmail.com> wrote: > I dont believe that you (or anyone) has sufficiently explained how sending > user requests to cold instances is ever better than warming it up first.. > Said request can ALWAYS be pulled right off the pending queue and sent to the > instance as soon as it is ready.
Let me take a shot at this. Brandon Wirtz touched on it before, when he said "I agree that the Queue is sub optimal, but it is more sub optimal the smaller you are. When you get to 50 instances it is amazing how well the load balancing works. " Suppose you have a massive store (we'll call it MegaWalmart). MegaWalmart has 100 staffed checkout lanes. Suppose all of these lanes have a single queue of people lined up, and a supervisor which sends customers to open checkout lanes (roughly analogous to your preferred way of handling GAE request queuing). That one line would be huge, would block traffic around the store, etc. It's far better, from MegaWalmart's POV, to have multiple check out lines, one line per each lane. Now suppose you have another checkout lane open up. (remember, now we have one line per lane) That's an additional 1% capacity. Now if the checkout clerk is drunk/hungover/whatever, that additional lane will take extra time to open, annoying the customers lined up in that lane. >From MegaWalmart's POV, who cares? Less than 1% of your customers were inconveniced. 99% of people still had a decent time checking out. Let's apply this to the scheduler. Suppose there was one single queue of requests. At Google-scale, that queue of requests could easily exceed millions of entries, possibly billions. And God help you if the machine hosting the queue gets a hiccup, or an outright failure. Don't you agree that, at least at Google-scale, requests should immediately be shunted to instance-level queues? Even if a single instance takes forever, or fails, we don't have to care: such a failure would only affect 0.000001% of users. This leads me to my final point: My understanding, from reading the documentation and blog/news posts about GAE, is that the the core of GAE is ripped pretty much directly from production Google services. The problem with this is, the scheduler is intended to work at very high scale, not at low scale. And frankly, this makes sense when you consider a lot of the finer points of the GAE ecosystem. So, to fix this: GAE needs to have a good relook at scheduler code, and rewrite it so that it has two different rules for apps at less than 50 instances, and more than 50 instances. Additionally, perhaps the GAE should look at making the scheduler smarter; perhaps it could measure the startup time of instances, and in the future, not send requests to cold instances until that startup time has elapsed. Personal thoughts: I have admined a corporate GAE app that has exceeded 100 instances, and I use GAE for personal apps that use, at max, 3-4 instances. When you use GAE at these two extremes, you really get an understanding of how GAE scales. For instance, a personal anecdote: for my low end apps, I occasionally notice that GAE starts up a new idle instance. I'm not charged for it, it doesn't do any work, but it is counted in the "current instances" counter. My guess is that, during non-peak times, the GAE scheduler will load into memory additional instances of low end apps, to try and be ready for quick scaling. So I believe the GAE team tries to handle low-end instances, but it does need more work. TLDR: the scheduler needs more work, and MegaWalmart is the same thing as Google's scheduler. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.