Re: [GENERAL] Tracking row updates - race condition

2005-03-29 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Alex Adriaanse [EMAIL PROTECTED] writes:

 Thanks for the input everyone.  I think Harald's approach will work
 well...

I'm not so sure anymore :-(

Consider something like that:

UPDATE tbl SET col1 = 1 WHERE col2 = 1;
UPDATE tbl SET col1 = 2 WHERE col2 = 1;

with not much time inbetween.  By using NULL temporarily, you destroy
the ordering.  The client won't miss an UPDATE, but it might execute
the second one before the first.  Neither my nor your idea appear to
take care of that.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] Tracking row updates - race condition

2005-03-29 Thread Alex Adriaanse
Harald Fuchs wrote:
In article [EMAIL PROTECTED],
Alex Adriaanse [EMAIL PROTECTED] writes:
 

Thanks for the input everyone.  I think Harald's approach will work
well...
   

I'm not so sure anymore :-(
Consider something like that:
UPDATE tbl SET col1 = 1 WHERE col2 = 1;
UPDATE tbl SET col1 = 2 WHERE col2 = 1;
with not much time inbetween.  By using NULL temporarily, you destroy
the ordering.  The client won't miss an UPDATE, but it might execute
the second one before the first.  Neither my nor your idea appear to
take care of that.
 

Maybe I'm missing something, but I don't really see the problem.  If 
that second statement is executed before the transaction containing the 
first statement is committed, wouldn't the second statement block until 
the first statement is committed?  If the first one is committed before 
the second statement is executed, then I don't see how the client will 
see the updates out-of-order.

Alex

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] Tracking row updates - race condition

2005-03-28 Thread Alex Adriaanse
Thanks for the input everyone.  I think Harald's approach will work 
well, so I'm planning on doing what he suggested, with a few 
modifications.  I think I can still use a sequence-backed INTEGER rather 
than TIMESTAMP, and have the trigger that sets the revision to NULL also 
NOTIFY the daemon that will update the revision to 
nextval('revision_sequence_name') of any updates, rather than running it 
from a cron job every X minutes.  The updates sent to the client would 
also include rows with revision = NULL so that the client would not lag 
behind, at the expense of sometimes having updates sent twice to the client.

Thanks again,
Alex
Harald Fuchs wrote:
In article [EMAIL PROTECTED],
Alex Adriaanse [EMAIL PROTECTED] writes:
 

I think that would greatly decrease the chances of a race condition
occurring, but I don't think it'd solve it.  What if 150 other
revisions occur between a row update and its corresponding commit?
   

How about the following:
* Use a TIMESTAMP rather than a SERIAL
* Set this timestamp to NULL in your INSERT/UPDATE trigger
* Use a cron job to set the timestamp to current_timestamp when it's NULL
This way the client would lag behind somewhat, depending on the cron
job frequency, but it should not miss a change.
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] Tracking row updates - race condition

2005-03-21 Thread Alex Adriaanse
I think that would greatly decrease the chances of a race condition 
occurring, but I don't think it'd solve it.  What if 150 other revisions 
occur between a row update and its corresponding commit?

Alex
Vincent Hikida wrote:
To fetch all updates since the last synchronization, the client would 
calculated a value for $lastrevision by running this query on its 
local database:
SELECT max(revision) AS lastrevision FROM codes;

It would then fetch all updated rows by running this query against 
the server:
SELECT * FROM codes WHERE revision  $lastrevision;

How about
SELECT * FROM codes WHERE revision  $lastrevision - 100
You could use another number other than 100. As you said, the client 
can handle duplicates.

Vincent
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if 
your
 joining column's datatypes do not match

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Tracking row updates

2005-03-21 Thread Alex Adriaanse
Qingqing Zhou wrote:
Alex Adriaanse [EMAIL PROTECTED] writes
 

This seems to work, except there exists a race condition.  Consider the
following series of events (in chronological order):
  1. Initially, in the codes table there's a row with id=1, revision=1,
 and a row with id=2, revision=2
  2. Client A begins a transaction, and updates the row with id=1,
 resulting in revision=3 for that row
  3. Client B begins a transaction, and updates the row with id=2,
 resulting in revision=4 for that row
  4. Client B commits the transaction
  5. Client C (which has $lastrevision=2 in its local database)
 synchronizes with the database by doing SELECT * FROM codes WHERE
 revision  2; and retrieves client B's update to the row with
 id=2, revision=4 (it doesn't yet see the update from client A)
  6. Client A commits the transaction
  7. Some time later, Client C synchronizes with the database again.
 $lastrevision for its database is now 4, so doing SELECT * FROM
 codes WHERE revision  4; does not retrieve any rows.  So client C
 never sees client A's update to the row with id=1
Essentially, the race condition occurs when the order of clients
committing transactions (i.e. the updates becoming visible to other
clients) differs from the order of clients generating sequence values.
Do you guys have any suggestions on how to avoid this race condition, or
maybe a more elegant way to synchronize the clients with the server?
   

In my understanding, you are doing something like a CVS does. Say if you
don't check out a file and you make a revision on the version you now
see(say version 1), then when you want to commit, you will probabaly receive
a merge required notice. Since in this interval, the file may have already
updated by another user (to version 2) - he is free to do so since nobody
knows that you might commit an update. To avoid this,  you have to check
out the file, i.e., lock the file to prevent other changes, then you are
free of any merge requirement. The cost is that you locked the file and
nobody could change it.  So the only options are merge or lock.
Regards,
Qingqing
Applying this analogy to our database, wouldn't that require a 
table-level lock during a CVS-like commit (which would mean locking the 
table, getting the revision number, updating the row(s), and committing 
the transaction)?

Alex
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [GENERAL] Tracking row updates - race condition

2005-03-21 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Alex Adriaanse [EMAIL PROTECTED] writes:

 I think that would greatly decrease the chances of a race condition
 occurring, but I don't think it'd solve it.  What if 150 other
 revisions occur between a row update and its corresponding commit?

How about the following:

* Use a TIMESTAMP rather than a SERIAL
* Set this timestamp to NULL in your INSERT/UPDATE trigger
* Use a cron job to set the timestamp to current_timestamp when it's NULL

This way the client would lag behind somewhat, depending on the cron
job frequency, but it should not miss a change.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] Tracking row updates

2005-03-21 Thread Qingqing Zhou

Alex Adriaanse [EMAIL PROTECTED] writes
 Applying this analogy to our database, wouldn't that require a
 table-level lock during a CVS-like commit (which would mean locking the
 table, getting the revision number, updating the row(s), and committing
 the transaction)?


You may have a look at how CVS works and decide it. Note the difference is
that CVS manages the files and versions, and seems you just manage the
versions?

Another way is to change the logic - try to submit the update together.
Think the bank-transfer example we always used in describing transaction's
atomic property, which shares something common in your case. We fold the
logic of reduce some money and add some money together, so no matter how
many concurrent transfer is on the target account, it is guarantteed no race
condition, since the serializable property assures that.

Regards,
Qingqing









---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] Tracking row updates

2005-03-20 Thread Alex Adriaanse
[I've tried to send this message to pgsql-general several times now,
but even though I'm subscribed to it I never saw the message show up
in the mailing list, so I'm trying to send it from a different account
now.  If you get several copies of this message, I apologize.]

I'm working on an application where we have a central database server 
and a bunch of clients that are disconnected most of the time, need to 
maintain a local copy of the central database.  The client databases are 
based on One$DB since it has to be lightweight.  The client does not 
access the server database directly, but through a server application.  
Periodically (about once a week) the clients will connect to the central 
database and retrieve updates made to the central database so that their 
local database will be up-to-date, as well as send local updates back to 
the server.  A lot of these clients will be connecting through a dial-up 
connection, and some of the tables can get quite large, so just 
retrieving all rows in a table is not an option when synchronizing.

This is how I currently have things designed on the server side:

Each table has a revision column, which is set to a sequence value every 
time a row is inserted or updated (deletes are translated into updates 
that hide the row by setting an enabled column to false), as you can 
see in the following statements:

CREATE TABLE codes (
id SERIAL PRIMARY KEY,
created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
updated TIMESTAMP,
revision SERIAL,
enabled BOOLEAN DEFAULT TRUE NOT NULL,
name VARCHAR(8) NOT NULL,
description VARCHAR(50)
) WITHOUT OIDS;

CREATE INDEX codes_revision_idx ON codes (revision);

CREATE FUNCTION row_update_codes() RETURNS trigger AS '
BEGIN
NEW.revision = nextval(''codes_revision_seq'');
NEW.updated = current_timestamp;
RETURN NEW;
END;
' LANGUAGE plpgsql;

CREATE TRIGGER update_row BEFORE INSERT OR UPDATE ON codes
FOR EACH ROW EXECUTE PROCEDURE row_update_codes();

CREATE RULE delete_row AS
ON DELETE TO codes DO INSTEAD
UPDATE codes SET enabled = FALSE WHERE id = OLD.id;

To fetch all updates since the last synchronization, the client would 
calculated a value for $lastrevision by running this query on its local 
database:
SELECT max(revision) AS lastrevision FROM codes;

It would then fetch all updated rows by running this query against the 
server:
SELECT * FROM codes WHERE revision  $lastrevision;

This seems to work, except there exists a race condition.  Consider the 
following series of events (in chronological order):

   1. Initially, in the codes table there's a row with id=1, revision=1,
  and a row with id=2, revision=2
   2. Client A begins a transaction, and updates the row with id=1,
  resulting in revision=3 for that row
   3. Client B begins a transaction, and updates the row with id=2,
  resulting in revision=4 for that row
   4. Client B commits the transaction
   5. Client C (which has $lastrevision=2 in its local database)
  synchronizes with the database by doing SELECT * FROM codes WHERE
  revision  2; and retrieves client B's update to the row with
  id=2, revision=4 (it doesn't yet see the update from client A)
   6. Client A commits the transaction
   7. Some time later, Client C synchronizes with the database again. 
  $lastrevision for its database is now 4, so doing SELECT * FROM
  codes WHERE revision  4; does not retrieve any rows.  So client C
  never sees client A's update to the row with id=1

Essentially, the race condition occurs when the order of clients 
committing transactions (i.e. the updates becoming visible to other 
clients) differs from the order of clients generating sequence values.  
Do you guys have any suggestions on how to avoid this race condition, or 
maybe a more elegant way to synchronize the clients with the server?

I was thinking about doing the following to solve this problem.  I'd 
create a new table, e.g. codes_active_txns, with a sequence_value 
column.  Before the server application starts any transaction involving 
an insert/update/delete to codes, it would retrieve 
nextval('codes_revision_seq') as $seq_val, insert it into 
codes_active_txns, commit, begin another transaction, and do its 
inserts/updates, setting revision = $seq_val for each row that it 
touches.  Once it's done with the table updates, it would commit the 
transaction, and delete the row containing $seq_val from 
codes_active_txns, and commit that.

The server would calculate $lastrevision with:
SELECT max(revision) FROM codes WHERE revision  (SELECT
COALESCE(min(sequence_value), 2147483647) FROM
codes_active_txns);
and send this to the client.  $lastrevision would contain the highest 
revision for which there are no active (non-committed) transactions with 
a revision lower than $lastrevision.  The client would save this value, 
and pass it back to the server during the next 

[GENERAL] Tracking row updates - race condition

2005-03-20 Thread Alex Adriaanse
I'm working on an application where we have a central database server 
and a bunch of clients that are disconnected most of the time, need to 
maintain a local copy of the central database.  The client databases are 
based on One$DB since it has to be lightweight.  The client does not 
access the server database directly, but through a server application.  
Periodically (about once a week) the clients will connect to the central 
database and retrieve updates made to the central database so that their 
local database will be up-to-date, as well as send local updates back to 
the server.  A lot of these clients will be connecting through a dial-up 
connection, and some of the tables can get quite large, so just 
retrieving all rows in a table is not an option when synchronizing.

This is how I currently have things designed on the server side:
Each table has a revision column, which is set to a sequence value every 
time a row is inserted or updated (deletes are translated into updates 
that hide the row by setting an enabled column to false), as you can 
see in the following statements:

CREATE TABLE codes (
   id SERIAL PRIMARY KEY,
   created TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
   updated TIMESTAMP,
   revision SERIAL,
   enabled BOOLEAN DEFAULT TRUE NOT NULL,
   name VARCHAR(8) NOT NULL,
   description VARCHAR(50)
) WITHOUT OIDS;
CREATE INDEX codes_revision_idx ON codes (revision);
CREATE FUNCTION row_update_codes() RETURNS trigger AS '
   BEGIN
   NEW.revision = nextval(''codes_revision_seq'');
   NEW.updated = current_timestamp;
   RETURN NEW;
   END;
' LANGUAGE plpgsql;
CREATE TRIGGER update_row BEFORE INSERT OR UPDATE ON codes
   FOR EACH ROW EXECUTE PROCEDURE row_update_codes();
CREATE RULE delete_row AS
   ON DELETE TO codes DO INSTEAD
   UPDATE codes SET enabled = FALSE WHERE id = OLD.id;
To fetch all updates since the last synchronization, the client would 
calculated a value for $lastrevision by running this query on its local 
database:
SELECT max(revision) AS lastrevision FROM codes;

It would then fetch all updated rows by running this query against the 
server:
SELECT * FROM codes WHERE revision  $lastrevision;

This seems to work, except there exists a race condition.  Consider the 
following series of events (in chronological order):

  1. Initially, in the codes table there's a row with id=1, revision=1,
 and a row with id=2, revision=2
  2. Client A begins a transaction, and updates the row with id=1,
 resulting in revision=3 for that row
  3. Client B begins a transaction, and updates the row with id=2,
 resulting in revision=4 for that row
  4. Client B commits the transaction
  5. Client C (which has $lastrevision=2 in its local database)
 synchronizes with the database by doing SELECT * FROM codes WHERE
 revision  2; and retrieves client B's update to the row with
 id=2, revision=4 (it doesn't yet see the update from client A)
  6. Client A commits the transaction
  7. Some time later, Client C synchronizes with the database again. 
 $lastrevision for its database is now 4, so doing SELECT * FROM
 codes WHERE revision  4; does not retrieve any rows.  So client C
 never sees client A's update to the row with id=1

Essentially, the race condition occurs when the order of clients 
committing transactions (i.e. the updates becoming visible to other 
clients) differs from the order of clients generating sequence values.  
Do you guys have any suggestions on how to avoid this race condition, or 
maybe a more elegant way to synchronize the clients with the server?

I was thinking about doing the following to solve this problem.  I'd 
create a new table, e.g. codes_active_txns, with a sequence_value 
column.  Before the server application starts any transaction involving 
an insert/update/delete to codes, it would retrieve 
nextval('codes_revision_seq') as $seq_val, insert it into 
codes_active_txns, commit, begin another transaction, and do its 
inserts/updates, setting revision = $seq_val for each row that it 
touches.  Once it's done with the table updates, it would commit the 
transaction, and delete the row containing $seq_val from 
codes_active_txns, and commit that.

The server would calculate $lastrevision with:
   SELECT max(revision) FROM codes WHERE revision  (SELECT 
COALESCE(min(sequence_value), 2147483647) FROM codes_active_txns);
and send this to the client.  $lastrevision would contain the highest 
revision for which there are no active (non-committed) transactions with 
a revision lower than $lastrevision.  The client would save this value, 
and pass it back to the server during the next synchronization session 
so that the server can figure out what rows it needs to send to the 
client with SELECT * FROM codes WHERE revision  $lastrevision;.  
Occasionally the same row might be sent to the client twice during two 
synchronization sessions, but that's no big deal because that number 
should be low and the client can 

Re: [GENERAL] Tracking row updates - race condition

2005-03-20 Thread Vincent Hikida
To fetch all updates since the last synchronization, the client would 
calculated a value for $lastrevision by running this query on its local 
database:
SELECT max(revision) AS lastrevision FROM codes;

It would then fetch all updated rows by running this query against the 
server:
SELECT * FROM codes WHERE revision  $lastrevision;

How about
SELECT * FROM codes WHERE revision  $lastrevision - 100
You could use another number other than 100. As you said, the client can 
handle duplicates.

Vincent 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [GENERAL] Tracking row updates

2005-03-20 Thread Qingqing Zhou

Alex Adriaanse [EMAIL PROTECTED] writes
 This seems to work, except there exists a race condition.  Consider the
 following series of events (in chronological order):

1. Initially, in the codes table there's a row with id=1, revision=1,
   and a row with id=2, revision=2
2. Client A begins a transaction, and updates the row with id=1,
   resulting in revision=3 for that row
3. Client B begins a transaction, and updates the row with id=2,
   resulting in revision=4 for that row
4. Client B commits the transaction
5. Client C (which has $lastrevision=2 in its local database)
   synchronizes with the database by doing SELECT * FROM codes WHERE
   revision  2; and retrieves client B's update to the row with
   id=2, revision=4 (it doesn't yet see the update from client A)
6. Client A commits the transaction
7. Some time later, Client C synchronizes with the database again.
   $lastrevision for its database is now 4, so doing SELECT * FROM
   codes WHERE revision  4; does not retrieve any rows.  So client C
   never sees client A's update to the row with id=1

 Essentially, the race condition occurs when the order of clients
 committing transactions (i.e. the updates becoming visible to other
 clients) differs from the order of clients generating sequence values.
 Do you guys have any suggestions on how to avoid this race condition, or
 maybe a more elegant way to synchronize the clients with the server?

In my understanding, you are doing something like a CVS does. Say if you
don't check out a file and you make a revision on the version you now
see(say version 1), then when you want to commit, you will probabaly receive
a merge required notice. Since in this interval, the file may have already
updated by another user (to version 2) - he is free to do so since nobody
knows that you might commit an update. To avoid this,  you have to check
out the file, i.e., lock the file to prevent other changes, then you are
free of any merge requirement. The cost is that you locked the file and
nobody could change it.  So the only options are merge or lock.

Regards,
Qingqing







---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org