On Jul 16, 10:35 pm, Andy Freeman <ana...@earthlink.net> wrote:
> >                                              I'm starting to think that the 
> > "GAE takes
> > care of the messy details of distributed systems programming" claim is
> > a bit overstated...
>
> Global clock consistency requires very expensive clocks accessible
> from every server with known latency (and even that's a bit dodgy).
> AFAIK, GAE doesn't provide that, but who does?
>
> GAE doesn't do the impossible, but also doesn't say that it does.  WRT
> the latter, would you really prefer otherwise?

But that's just it -- in many places it's claimed that GAE makes it
all a cakewalk.  From the datastore docs:

"""
Storing data in a scalable web application can be tricky. A user could
be interacting with any of dozens of web servers at a given time, and
the user's next request could go to a different web server than the
one that handled the previous request. All web servers need to be
interacting with data that is also spread out across dozens of
machines, possibly in different locations around the world.

Thanks to Google App Engine, you don't have to worry about any of
that. App Engine's infrastructure takes care of all of the
distribution, replication and load balancing of data behind a simple
API—and you get a powerful query engine and transactions as well.
"""

You could argue that that's not claiming to do the impossible, but
"you don't have to worry about any of that" is certainly not true.
Nowhere in the documentation is there a discussion of the kinds of
subtle gotchas that you need to be aware of when programming for this
kind of system.  It's all just "golly isn't this so gosh-darn easy!"
You have to go digging to find the article on transaction isolation
where you find out that your queries can return results that, um,
don't match your queries.  And AFAICT you *do* have to worry about
subsequent requests being handled by different servers, since there
doesn't seem to be any guarantee that the datastore writes made in one
request will be seen in the next.  Memcache doesn't have transactions,
so it seems like guaranteeing coherence with the datastore is tricky.

I worked in a distributed systems group for many years, so I know that
many of these problems are simply inherent to distributed systems.  It
doesn't disturb me that they exist.  What bothers me is the way these
issues are broadly *ignored* by GAE's documentation.  If I wasn't a
bit savvy about distributed systems I probably wouldn't have realized
that clock skew could cause problems, and nothing I read in GAE's docs
would have helped me figure it out.  So no, I don't want GAE to claim
to do the impossible, I want them to *stop* claiming to do the
impossible.  I would love to see some articles about the pitfalls of
the system and how to avoid them or mitigate them.  The transaction
isolation article is great in that respect -- I hope people at Google
are planning more along those lines.

Cheers,
-n8

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to