[google-appengine] Re: Concurrency Control in the datastore

DXD Wed, 03 Dec 2008 16:44:21 -0800

Hi Ryan,

So actually you guys use the multiversion timestamp control, and the
rules of the scheduler are in fact more straightforward than what I
outlined. I believe they are as follows (pls correct me if I'm wrong
somewhere):


When a transaction T starts, it is assigned a timestamp TS(T)
generated based on last committed timestamp CTS(R) of the entity group
it targets (or more precisely, of the root entity R). It also
remembers CTS(R).

1. T wants to write entity X in the group: it just goes ahead and
performs the write, including writing the journal in R and applying
that journal to X, excluding setting the new value for CTS(R).

2. T wants to read entity X: if T has not performed any write to X, it
will retrieve data from X as of the original CTS(R) that it remembers
from the beginning. If it has modified X, it will retrieve data from X
as of TS(T) (I suppose T must see the change it made to X; note that
at this point TS(T) has not become the last committed timestamp yet).

3. T completes all its actions and now wants to commit. It compares
the original CTS(R) that it remembers from the beginning with the
current value at this point; if they are still the same, TS(T) is set
as the new value of CTS(R) and T effectively gets committed;
otherwise, T must be rolled back. Note that if T is a read-only
transaction, this step 3 essentially gets omitted (and T always
succeeds).

If what I outline above is correct, I guess there's something I'm not
totally comfortable with. In a general timestamp-based scheduler, the
timestamp order of transactions is also the serial order in which they
must appear to execute. So if two transactions T1, T2 start in this
order, and T1 writes an entity X whereas T2 reads that same entity,
then I believe the theory dictates that T2 must read what T1 writes.

However, in the datastore, that's not always the case. For ex: we have
T1 and T2 running concurrently, T1 is a write transaction whereas T2
is a read-only one; T2 starts when T1 is ongoing (but not yet
committed) -> the last committed timestamp, and thus the data of
entity X that T2 sees, is as of when T1 starts. T1 makes changes to X
and then commits at some point later. But these changes are not seen
by T2. So T2 does not read what T1 writes, which is contradictory to
the theory. I understand this discrepancy exists because you use
committed time instead of write time, and you omit read time. But what
I'm wondering is why this behavior is reasonable, why it makes sense.
Could you pls provide some explanations? (if these explanations take
sometime, could you pls verify the scheduler's rules I outlined above
first?)

Thanks a lot,
David.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en
-~----------~----~----~----~------~----~------~--~---

[google-appengine] Re: Concurrency Control in the datastore

Reply via email to