On Jul 18, 2:38 pm, Michael Hermus <michael.her...@gmail.com> wrote:
> I dont believe that you (or anyone) has sufficiently explained how sending 
> user requests to cold instances is ever better than warming it up first.. 
> Said request can ALWAYS be pulled right off the pending queue and sent to the 
> instance as soon as it is ready.

Let me take a shot at this. Brandon Wirtz touched on it before, when
he said "I agree that the Queue is sub optimal, but it is more sub
optimal the smaller you are.  When you get to 50 instances it is
amazing how well the load balancing works. "

Suppose you have a massive store (we'll call it MegaWalmart).
MegaWalmart has 100 staffed checkout lanes. Suppose all of these lanes
have a single queue of people lined up, and a supervisor which sends
customers to open checkout lanes (roughly analogous to your preferred
way of handling GAE request queuing). That one line would be huge,
would block traffic around the store, etc. It's far better, from
MegaWalmart's POV, to have multiple check out lines, one line per each
lane.

Now suppose you have another checkout lane open up. (remember, now we
have one line per lane) That's an additional 1% capacity. Now if the
checkout clerk is drunk/hungover/whatever, that additional lane will
take extra time to open, annoying the customers lined up in that lane.
>From MegaWalmart's POV, who cares? Less than 1% of your customers were
inconveniced. 99% of people still had a decent time checking out.

Let's apply this to the scheduler. Suppose there was one single queue
of requests. At Google-scale, that queue of requests could easily
exceed millions of entries, possibly billions. And God help you if the
machine hosting the queue gets a hiccup, or an outright failure. Don't
you agree that, at least at Google-scale, requests should immediately
be shunted to instance-level queues? Even if a single instance takes
forever, or fails, we don't have to care: such a failure would only
affect 0.000001% of users.

This leads me to my final point: My understanding, from reading the
documentation and blog/news posts about GAE, is that the the core of
GAE is ripped pretty much directly from production Google services.
The problem with this is, the scheduler is intended to work at very
high scale, not at low scale. And frankly, this makes sense when you
consider a lot of the finer points of the GAE ecosystem.

So, to fix this: GAE needs to have a good relook at scheduler code,
and rewrite it so that it has two different rules for apps at less
than 50 instances, and more than 50 instances. Additionally, perhaps
the GAE should look at making the scheduler smarter; perhaps it could
measure the startup time of instances, and in the future, not send
requests to cold instances until that startup time has elapsed.

Personal thoughts: I have admined a corporate GAE app that has
exceeded 100 instances, and I use GAE for personal apps that use, at
max, 3-4 instances. When you use GAE at these two extremes, you really
get an understanding of how GAE scales. For instance, a personal
anecdote: for my low end apps, I occasionally notice that GAE starts
up a new idle instance. I'm not charged for it, it doesn't do any
work, but it is counted in the "current instances" counter. My guess
is that, during non-peak times, the GAE scheduler will load into
memory additional instances of low end apps, to try and be ready for
quick scaling.  So I believe the GAE team tries to handle low-end
instances, but it does need more work.

TLDR: the scheduler needs more work, and MegaWalmart is the same thing
as Google's scheduler.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Reply via email to