[google-appengine] Re: "15 minutes idle" - my new understanding of how it works

Tim Thu, 08 Sep 2011 12:57:33 -0700


On Thursday, 8 September 2011 16:57:00 UTC+1, zdravko wrote:
>
> Why could a start-up image not be saved and almost instantaneously 
> loaded from disk (like PC hibernation does) ? 
>
> Would this not totally replace the need for any idle instances ? 
> Where is the catch, considering how too obvious this is ;?) 
>


Again, I don't work for  Google or AppEngine, but having worked on large 
computational (non-web) grids let me give you a possible reason.

The machines on our grids had strictly controlled disk images, and apart 
from swap and OS tempfiles, the jobs run on the machines never get to write 
anything to the local disk - all persistence is to shared and replicated 
data storage (see the way that scripts can't access static files or the file 
system, but the blobstore and datastore write data elsewhere). The scripts 
that are invoked are loaded from smart shared storage drives with local 
caching buried in the file system (something like AFS - the Andrew File 
System - is ideal: very fast global reads with smart tiered caching but 
quite horrible distributed write performance).

This way machines can be swapped in and out without the grid management even 
knowing that a machine has changed (ie you don't have to remove machine 
#1234 and add #6789, but you take out #1234 and replace it with a new 
#1234).

Now hibernating to disk is either going to write to the local disk, which is 
a no-no from managing the local machines (which may be diskless using 
network boot or minimal disk for just the O/S and swap), or it'll hibernate 
the image to shared remote storage, which then has to replicated and managed 
and synchronised etc and you have to allow for pulling the image from the 
centralised disk and starting up and re-connecting to any dynamic services, 
in which case you're back into large startup times and a whole load of 
persistence costs just like the datastore/blobstore every time something is 
shut down.

Now if you were Google and you were really going to do hibernation, to my 
mind you'd build the hibernation image at deployment time (ie when a new 
version is uploaded the grid starts up an instance runs it to some known 
initialised but not active state, and hibernates that and then deploys THAT 
image for starting up instances of the new version), but that would require 
APIs or similar to let you define when your process is ready to hibernate, 
and explanations of how things like changes of GAE version or the built-in 
libraries etc will cause new images to be built and re-deployed, or the 
restarted image would have such dynamic info passed to it to re-initialise 
key details

So the short answer is - hibernation doesn't fit with the way the compute 
grid has probably been built and is managed.... a grid is not 100,000 
desktop machines...

--
T

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/Vn1qimWKv0MJ.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: "15 minutes idle" - my new understanding of how it works

Reply via email to