On Thursday, 8 September 2011 16:57:00 UTC+1, zdravko wrote: > > Why could a start-up image not be saved and almost instantaneously > loaded from disk (like PC hibernation does) ? > > Would this not totally replace the need for any idle instances ? > Where is the catch, considering how too obvious this is ;?) >
Again, I don't work for Google or AppEngine, but having worked on large computational (non-web) grids let me give you a possible reason. The machines on our grids had strictly controlled disk images, and apart from swap and OS tempfiles, the jobs run on the machines never get to write anything to the local disk - all persistence is to shared and replicated data storage (see the way that scripts can't access static files or the file system, but the blobstore and datastore write data elsewhere). The scripts that are invoked are loaded from smart shared storage drives with local caching buried in the file system (something like AFS - the Andrew File System - is ideal: very fast global reads with smart tiered caching but quite horrible distributed write performance). This way machines can be swapped in and out without the grid management even knowing that a machine has changed (ie you don't have to remove machine #1234 and add #6789, but you take out #1234 and replace it with a new #1234). Now hibernating to disk is either going to write to the local disk, which is a no-no from managing the local machines (which may be diskless using network boot or minimal disk for just the O/S and swap), or it'll hibernate the image to shared remote storage, which then has to replicated and managed and synchronised etc and you have to allow for pulling the image from the centralised disk and starting up and re-connecting to any dynamic services, in which case you're back into large startup times and a whole load of persistence costs just like the datastore/blobstore every time something is shut down. Now if you were Google and you were really going to do hibernation, to my mind you'd build the hibernation image at deployment time (ie when a new version is uploaded the grid starts up an instance runs it to some known initialised but not active state, and hibernates that and then deploys THAT image for starting up instances of the new version), but that would require APIs or similar to let you define when your process is ready to hibernate, and explanations of how things like changes of GAE version or the built-in libraries etc will cause new images to be built and re-deployed, or the restarted image would have such dynamic info passed to it to re-initialise key details So the short answer is - hibernation doesn't fit with the way the compute grid has probably been built and is managed.... a grid is not 100,000 desktop machines... -- T -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/Vn1qimWKv0MJ. To post to this group, send email to google-appengine@googlegroups.com. To unsubscribe from this group, send email to google-appengine+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.