Re: [GENERAL] Tracking row updates - race condition
In article [EMAIL PROTECTED], Alex Adriaanse [EMAIL PROTECTED] writes: Thanks for the input everyone. I think Harald's approach will work well... I'm not so sure anymore :-( Consider something like that: UPDATE tbl SET col1 = 1 WHERE col2 = 1; UPDATE tbl SET col1 = 2 WHERE col2 = 1; with not much time inbetween. By using NULL temporarily, you destroy the ordering. The client won't miss an UPDATE, but it might execute the second one before the first. Neither my nor your idea appear to take care of that. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Tracking row updates - race condition
Harald Fuchs wrote: In article [EMAIL PROTECTED], Alex Adriaanse [EMAIL PROTECTED] writes: Thanks for the input everyone. I think Harald's approach will work well... I'm not so sure anymore :-( Consider something like that: UPDATE tbl SET col1 = 1 WHERE col2 = 1; UPDATE tbl SET col1 = 2 WHERE col2 = 1; with not much time inbetween. By using NULL temporarily, you destroy the ordering. The client won't miss an UPDATE, but it might execute the second one before the first. Neither my nor your idea appear to take care of that. Maybe I'm missing something, but I don't really see the problem. If that second statement is executed before the transaction containing the first statement is committed, wouldn't the second statement block until the first statement is committed? If the first one is committed before the second statement is executed, then I don't see how the client will see the updates out-of-order. Alex ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] Tracking row updates - race condition
Thanks for the input everyone. I think Harald's approach will work well, so I'm planning on doing what he suggested, with a few modifications. I think I can still use a sequence-backed INTEGER rather than TIMESTAMP, and have the trigger that sets the revision to NULL also NOTIFY the daemon that will update the revision to nextval('revision_sequence_name') of any updates, rather than running it from a cron job every X minutes. The updates sent to the client would also include rows with revision = NULL so that the client would not lag behind, at the expense of sometimes having updates sent twice to the client. Thanks again, Alex Harald Fuchs wrote: In article [EMAIL PROTECTED], Alex Adriaanse [EMAIL PROTECTED] writes: I think that would greatly decrease the chances of a race condition occurring, but I don't think it'd solve it. What if 150 other revisions occur between a row update and its corresponding commit? How about the following: * Use a TIMESTAMP rather than a SERIAL * Set this timestamp to NULL in your INSERT/UPDATE trigger * Use a cron job to set the timestamp to current_timestamp when it's NULL This way the client would lag behind somewhat, depending on the cron job frequency, but it should not miss a change. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Tracking row updates - race condition
I think that would greatly decrease the chances of a race condition occurring, but I don't think it'd solve it. What if 150 other revisions occur between a row update and its corresponding commit? Alex Vincent Hikida wrote: To fetch all updates since the last synchronization, the client would calculated a value for $lastrevision by running this query on its local database: SELECT max(revision) AS lastrevision FROM codes; It would then fetch all updated rows by running this query against the server: SELECT * FROM codes WHERE revision $lastrevision; How about SELECT * FROM codes WHERE revision $lastrevision - 100 You could use another number other than 100. As you said, the client can handle duplicates. Vincent ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Tracking row updates
Qingqing Zhou wrote: Alex Adriaanse [EMAIL PROTECTED] writes This seems to work, except there exists a race condition. Consider the following series of events (in chronological order): 1. Initially, in the codes table there's a row with id=1, revision=1, and a row with id=2, revision=2 2. Client A begins a transaction, and updates the row with id=1, resulting in revision=3 for that row 3. Client B begins a transaction, and updates the row with id=2, resulting in revision=4 for that row 4. Client B commits the transaction 5. Client C (which has $lastrevision=2 in its local database) synchronizes with the database by doing SELECT * FROM codes WHERE revision 2; and retrieves client B's update to the row with id=2, revision=4 (it doesn't yet see the update from client A) 6. Client A commits the transaction 7. Some time later, Client C synchronizes with the database again. $lastrevision for its database is now 4, so doing SELECT * FROM codes WHERE revision 4; does not retrieve any rows. So client C never sees client A's update to the row with id=1 Essentially, the race condition occurs when the order of clients committing transactions (i.e. the updates becoming visible to other clients) differs from the order of clients generating sequence values. Do you guys have any suggestions on how to avoid this race condition, or maybe a more elegant way to synchronize the clients with the server? In my understanding, you are doing something like a CVS does. Say if you don't check out a file and you make a revision on the version you now see(say version 1), then when you want to commit, you will probabaly receive a merge required notice. Since in this interval, the file may have already updated by another user (to version 2) - he is free to do so since nobody knows that you might commit an update. To avoid this, you have to check out the file, i.e., lock the file to prevent other changes, then you are free of any merge requirement. The cost is that you locked the file and nobody could change it. So the only options are merge or lock. Regards, Qingqing Applying this analogy to our database, wouldn't that require a table-level lock during a CVS-like commit (which would mean locking the table, getting the revision number, updating the row(s), and committing the transaction)? Alex ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Tracking row updates - race condition
In article [EMAIL PROTECTED], Alex Adriaanse [EMAIL PROTECTED] writes: I think that would greatly decrease the chances of a race condition occurring, but I don't think it'd solve it. What if 150 other revisions occur between a row update and its corresponding commit? How about the following: * Use a TIMESTAMP rather than a SERIAL * Set this timestamp to NULL in your INSERT/UPDATE trigger * Use a cron job to set the timestamp to current_timestamp when it's NULL This way the client would lag behind somewhat, depending on the cron job frequency, but it should not miss a change. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Tracking row updates
Alex Adriaanse [EMAIL PROTECTED] writes Applying this analogy to our database, wouldn't that require a table-level lock during a CVS-like commit (which would mean locking the table, getting the revision number, updating the row(s), and committing the transaction)? You may have a look at how CVS works and decide it. Note the difference is that CVS manages the files and versions, and seems you just manage the versions? Another way is to change the logic - try to submit the update together. Think the bank-transfer example we always used in describing transaction's atomic property, which shares something common in your case. We fold the logic of reduce some money and add some money together, so no matter how many concurrent transfer is on the target account, it is guarantteed no race condition, since the serializable property assures that. Regards, Qingqing ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Tracking row updates
[I've tried to send this message to pgsql-general several times now, but even though I'm subscribed to it I never saw the message show up in the mailing list, so I'm trying to send it from a different account now. If you get several copies of this message, I apologize.] I'm working on an application where we have a central database server and a bunch of clients that are disconnected most of the time, need to maintain a local copy of the central database. The client databases are based on One$DB since it has to be lightweight. The client does not access the server database directly, but through a server application. Periodically (about once a week) the clients will connect to the central database and retrieve updates made to the central database so that their local database will be up-to-date, as well as send local updates back to the server. A lot of these clients will be connecting through a dial-up connection, and some of the tables can get quite large, so just retrieving all rows in a table is not an option when synchronizing. This is how I currently have things designed on the server side: Each table has a revision column, which is set to a sequence value every time a row is inserted or updated (deletes are translated into updates that hide the row by setting an enabled column to false), as you can see in the following statements: CREATE TABLE codes ( id SERIAL PRIMARY KEY, created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, updated TIMESTAMP, revision SERIAL, enabled BOOLEAN DEFAULT TRUE NOT NULL, name VARCHAR(8) NOT NULL, description VARCHAR(50) ) WITHOUT OIDS; CREATE INDEX codes_revision_idx ON codes (revision); CREATE FUNCTION row_update_codes() RETURNS trigger AS ' BEGIN NEW.revision = nextval(''codes_revision_seq''); NEW.updated = current_timestamp; RETURN NEW; END; ' LANGUAGE plpgsql; CREATE TRIGGER update_row BEFORE INSERT OR UPDATE ON codes FOR EACH ROW EXECUTE PROCEDURE row_update_codes(); CREATE RULE delete_row AS ON DELETE TO codes DO INSTEAD UPDATE codes SET enabled = FALSE WHERE id = OLD.id; To fetch all updates since the last synchronization, the client would calculated a value for $lastrevision by running this query on its local database: SELECT max(revision) AS lastrevision FROM codes; It would then fetch all updated rows by running this query against the server: SELECT * FROM codes WHERE revision $lastrevision; This seems to work, except there exists a race condition. Consider the following series of events (in chronological order): 1. Initially, in the codes table there's a row with id=1, revision=1, and a row with id=2, revision=2 2. Client A begins a transaction, and updates the row with id=1, resulting in revision=3 for that row 3. Client B begins a transaction, and updates the row with id=2, resulting in revision=4 for that row 4. Client B commits the transaction 5. Client C (which has $lastrevision=2 in its local database) synchronizes with the database by doing SELECT * FROM codes WHERE revision 2; and retrieves client B's update to the row with id=2, revision=4 (it doesn't yet see the update from client A) 6. Client A commits the transaction 7. Some time later, Client C synchronizes with the database again. $lastrevision for its database is now 4, so doing SELECT * FROM codes WHERE revision 4; does not retrieve any rows. So client C never sees client A's update to the row with id=1 Essentially, the race condition occurs when the order of clients committing transactions (i.e. the updates becoming visible to other clients) differs from the order of clients generating sequence values. Do you guys have any suggestions on how to avoid this race condition, or maybe a more elegant way to synchronize the clients with the server? I was thinking about doing the following to solve this problem. I'd create a new table, e.g. codes_active_txns, with a sequence_value column. Before the server application starts any transaction involving an insert/update/delete to codes, it would retrieve nextval('codes_revision_seq') as $seq_val, insert it into codes_active_txns, commit, begin another transaction, and do its inserts/updates, setting revision = $seq_val for each row that it touches. Once it's done with the table updates, it would commit the transaction, and delete the row containing $seq_val from codes_active_txns, and commit that. The server would calculate $lastrevision with: SELECT max(revision) FROM codes WHERE revision (SELECT COALESCE(min(sequence_value), 2147483647) FROM codes_active_txns); and send this to the client. $lastrevision would contain the highest revision for which there are no active (non-committed) transactions with a revision lower than $lastrevision. The client would save this value, and pass it back to the server during the next
[GENERAL] Tracking row updates - race condition
I'm working on an application where we have a central database server and a bunch of clients that are disconnected most of the time, need to maintain a local copy of the central database. The client databases are based on One$DB since it has to be lightweight. The client does not access the server database directly, but through a server application. Periodically (about once a week) the clients will connect to the central database and retrieve updates made to the central database so that their local database will be up-to-date, as well as send local updates back to the server. A lot of these clients will be connecting through a dial-up connection, and some of the tables can get quite large, so just retrieving all rows in a table is not an option when synchronizing. This is how I currently have things designed on the server side: Each table has a revision column, which is set to a sequence value every time a row is inserted or updated (deletes are translated into updates that hide the row by setting an enabled column to false), as you can see in the following statements: CREATE TABLE codes ( id SERIAL PRIMARY KEY, created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, updated TIMESTAMP, revision SERIAL, enabled BOOLEAN DEFAULT TRUE NOT NULL, name VARCHAR(8) NOT NULL, description VARCHAR(50) ) WITHOUT OIDS; CREATE INDEX codes_revision_idx ON codes (revision); CREATE FUNCTION row_update_codes() RETURNS trigger AS ' BEGIN NEW.revision = nextval(''codes_revision_seq''); NEW.updated = current_timestamp; RETURN NEW; END; ' LANGUAGE plpgsql; CREATE TRIGGER update_row BEFORE INSERT OR UPDATE ON codes FOR EACH ROW EXECUTE PROCEDURE row_update_codes(); CREATE RULE delete_row AS ON DELETE TO codes DO INSTEAD UPDATE codes SET enabled = FALSE WHERE id = OLD.id; To fetch all updates since the last synchronization, the client would calculated a value for $lastrevision by running this query on its local database: SELECT max(revision) AS lastrevision FROM codes; It would then fetch all updated rows by running this query against the server: SELECT * FROM codes WHERE revision $lastrevision; This seems to work, except there exists a race condition. Consider the following series of events (in chronological order): 1. Initially, in the codes table there's a row with id=1, revision=1, and a row with id=2, revision=2 2. Client A begins a transaction, and updates the row with id=1, resulting in revision=3 for that row 3. Client B begins a transaction, and updates the row with id=2, resulting in revision=4 for that row 4. Client B commits the transaction 5. Client C (which has $lastrevision=2 in its local database) synchronizes with the database by doing SELECT * FROM codes WHERE revision 2; and retrieves client B's update to the row with id=2, revision=4 (it doesn't yet see the update from client A) 6. Client A commits the transaction 7. Some time later, Client C synchronizes with the database again. $lastrevision for its database is now 4, so doing SELECT * FROM codes WHERE revision 4; does not retrieve any rows. So client C never sees client A's update to the row with id=1 Essentially, the race condition occurs when the order of clients committing transactions (i.e. the updates becoming visible to other clients) differs from the order of clients generating sequence values. Do you guys have any suggestions on how to avoid this race condition, or maybe a more elegant way to synchronize the clients with the server? I was thinking about doing the following to solve this problem. I'd create a new table, e.g. codes_active_txns, with a sequence_value column. Before the server application starts any transaction involving an insert/update/delete to codes, it would retrieve nextval('codes_revision_seq') as $seq_val, insert it into codes_active_txns, commit, begin another transaction, and do its inserts/updates, setting revision = $seq_val for each row that it touches. Once it's done with the table updates, it would commit the transaction, and delete the row containing $seq_val from codes_active_txns, and commit that. The server would calculate $lastrevision with: SELECT max(revision) FROM codes WHERE revision (SELECT COALESCE(min(sequence_value), 2147483647) FROM codes_active_txns); and send this to the client. $lastrevision would contain the highest revision for which there are no active (non-committed) transactions with a revision lower than $lastrevision. The client would save this value, and pass it back to the server during the next synchronization session so that the server can figure out what rows it needs to send to the client with SELECT * FROM codes WHERE revision $lastrevision;. Occasionally the same row might be sent to the client twice during two synchronization sessions, but that's no big deal because that number should be low and the client can
Re: [GENERAL] Tracking row updates - race condition
To fetch all updates since the last synchronization, the client would calculated a value for $lastrevision by running this query on its local database: SELECT max(revision) AS lastrevision FROM codes; It would then fetch all updated rows by running this query against the server: SELECT * FROM codes WHERE revision $lastrevision; How about SELECT * FROM codes WHERE revision $lastrevision - 100 You could use another number other than 100. As you said, the client can handle duplicates. Vincent ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Tracking row updates
Alex Adriaanse [EMAIL PROTECTED] writes This seems to work, except there exists a race condition. Consider the following series of events (in chronological order): 1. Initially, in the codes table there's a row with id=1, revision=1, and a row with id=2, revision=2 2. Client A begins a transaction, and updates the row with id=1, resulting in revision=3 for that row 3. Client B begins a transaction, and updates the row with id=2, resulting in revision=4 for that row 4. Client B commits the transaction 5. Client C (which has $lastrevision=2 in its local database) synchronizes with the database by doing SELECT * FROM codes WHERE revision 2; and retrieves client B's update to the row with id=2, revision=4 (it doesn't yet see the update from client A) 6. Client A commits the transaction 7. Some time later, Client C synchronizes with the database again. $lastrevision for its database is now 4, so doing SELECT * FROM codes WHERE revision 4; does not retrieve any rows. So client C never sees client A's update to the row with id=1 Essentially, the race condition occurs when the order of clients committing transactions (i.e. the updates becoming visible to other clients) differs from the order of clients generating sequence values. Do you guys have any suggestions on how to avoid this race condition, or maybe a more elegant way to synchronize the clients with the server? In my understanding, you are doing something like a CVS does. Say if you don't check out a file and you make a revision on the version you now see(say version 1), then when you want to commit, you will probabaly receive a merge required notice. Since in this interval, the file may have already updated by another user (to version 2) - he is free to do so since nobody knows that you might commit an update. To avoid this, you have to check out the file, i.e., lock the file to prevent other changes, then you are free of any merge requirement. The cost is that you locked the file and nobody could change it. So the only options are merge or lock. Regards, Qingqing ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org