[google-appengine] Transactionally updating multiple entities over 1MB

tav Fri, 26 Jun 2009 07:24:07 -0700

Hey guys and girls,

I've got a situation where I'd have to "transactionally" update
multiple entities which would cumulatively be greater than the 1MB
datastore API limit... is there a decent solution for this?


For example, let's say that I start off with entities E1, E2, E3 which
are all about 400kb each. All the entities are specific to a given
User. I grab them all on a "remote node" and do some calculations on
them to yield new "computed" entities E1', E2', and E3'.

Any failure of the remote node or the datastore is recoverable except
when the remote node tries to *update* the datastore... in that
situation, it'd have to batch the update into 2 separate .put() calls
to overcome the 1MB limit. And should the remote node die after the
first put(), we have a messy situation =)

My solution at the moment is to:

1. Create a UserRecord entity which has a 'version' attribute
corresponding to the "latest" versions of the related entities for any
given User.

2. Add a 'version' attribute to all the entities.

3. Whenever the remote node creates the "computed" new set of
entities, it creates them all with a new version number -- applying
the same version for all the entities in the same "transaction".

4. These new entities are actually .put() as totally separate and new
entities, i.e. they do not overwrite the old entities.

5. Once a remote node successfully writes new versions of all the
entities relating to a User, it updates the UserRecord with the latest
version number.

6. From the remote node, delete all Entities related to a User which
don't have the latest version number.

7. Have a background thread check and do deletions of invalid versions
in case a remote node had died whilst doing step 4, 5 or 6...

I've skipped out the complications caused by multiple remote nodes
working on data relating to the same User -- but, overall, the
approach is pretty much the same.

Now, the advantage of this approach (as far as I can see) is that data
relating to a User is never *lost*. That is, data is never lost before
there is valid data to replace it.

However, the disadvantage is that for (unknown) periods of time, there
would be duplicate data sets for a given User... All of which is
caused by the fact that the datastore calls cannot exceed 1MB. =(

So queries will yield duplicate data -- gah!!

Is there a better approach to try at all? Thanks!

-- 
love, tav

plex:espians/tav | t...@espians.com | +44 (0) 7809 569 369
http://tav.espians.com | http://twitter.com/tav | skype:tavespian

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Transactionally updating multiple entities over 1MB

Reply via email to