On 5/26/2010 3:16 PM, Heikki Linnakangas wrote:
On 26/05/10 21:43, Jan Wieck wrote:
On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:
It would not get called during recovery, but I believe that would be
sufficient for Slony. You could always batch commits that you don't
know when they committed as if they committed simultaneously.

Here you are mistaken. If the origin crashes but can recover not yet
flushed to xlog-commit-order transactions, then the consumer has no idea
about the order of those commits, which throws us back to the point
where we require a non cacheable global sequence to replay the
individual actions of those "now batched" transactions in an agreeable
order.

The commit order data needs to be covered by crash recovery.

Perhaps I'm missing something,

Apparently, more about that at the end.

I'm thinking that the commit-order log would contain two kinds of records:

a) Transaction with XID X committed
b) All transactions with XID < X committed

If that was true then long running transactions would delay all commits for transactions that started after them. Do they?


During normal operation we write the 1st kind of record at every commit. After crash recovery (perhaps at the first commit after recovery or when the slon daemon first polls the server, as there's no hook for end-of-recovery), we write the 2nd kind of record.

I think the callback is also called during backend startup, which means that it could record the first XID to come which is known from the control file and in that case, all < XID's are committed or aborted.

Which leads us to your missing piece above, the need for the global non cacheable sequence.

Consider two transactions A and B that due to transaction batching between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those individual row actions, then it would not have any idea if within that batch the action of B (higher XID) actually came first. Without that knowledge there is a 50/50 chance of getting your replica out of sync with that simple conflict.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to