On Jan21, 2011, at 01:28 , Simon Riggs wrote: > What I'm still not clear on is why that HS is different. Whatever rules > apply on the master must also apply on the standby, immutably. Why is it > we need to pass explicit snapshot information from master to standby? We > don't do that, except at startup for normal HS. Why do we need that?
> I hear, but do not yet understand, that the SSI transaction sequence on > the master may differ from the WAL transaction sequence. Is it important > that the ordering on the master would differ from the standby? The COMMIT order in the actual, concurrent, schedule doesn't not necessarily represent the order of the transaction in an equivalent serial schedule. Here's an example T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T1: UPDATE D1 ... ; T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here) T2: SELECT * FROM D1 ... ; T2: UPDATE D2 ... ; T1: COMMIT; T3: SELECT * FROM D1, D2; T2: COMMIT; Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent serial schedule. In any such schedule T2 must run before T1 because T2 didn't see T1's changes to D1 T3 must run after T1 because T3 did see T1's changes to D1 T3 must run before T2 because T3 didn't see T2's changes to D2 This is obviously impossible - if T3 runs before T2 and T2 runs before T1 then T3 runs before T1, contradicting the second requirement. There is thus no equivalent serial schedule and we must abort of these transactions with a serialization error. Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone, an equivalent serial schedule is T2,T1! On the master, these "run before" requirement are tracked by remembering which transaction read which parts of the data via the SIREAD-lock mechanism (These are more flags than locks, since nobody ever blocks on them). Since we do not want to report SIREAD locks back to the master, the slave has to prevent this another way. Kevin's proposed solution does that by only using those snapshots on the slave for which reading the *whole* database is safe. The downside is that whether or not a snapshot is safe can only be decided after all concurrent transactions have finished. The snapshot is thus always a bit outdated, but shows that state that is known to be possible in some serial schedule. The very same mechanism can be used on the master also by setting the isolation level to SERIALIZABLE READ ONLY DEFERRED. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers