Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread Joris Dobbelsteen

David Wilson wrote:

On Mon, Apr 21, 2008 at 7:55 PM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:

 If you want to clean up the the staging table I have some concerns about
the advisory lock. I think you mean exclusive table lock.


Either works, really. An advisory lock is really just a lock over
which you have control of the meaning, as long as you're using it in
the appropriate places. Also, an advisory lock on just the processes
doing staging-to-log moves would allow writes into the staging table
to continue concurrently with the staging-to-log transaction (whereas
an exclusive lock would unnecessarily prevent them).


Describe the mechanism, because I don't really believe it yet. I think 
you need to do a advisory lock around every commit of every transaction 
that writes to the log table.


If you are only using the advisory lock in the staging-to-log 
transaction, how would this prevent newly committed tuples to not show 
up during this process? (You can't both delete and insert in a single 
statement, I believe, in which case you won't have a problem anyways).



Also, while Vance appears to have chosen to have a dedicated
staging-to-log process, even that isn't necessary- each reader can
simply do the lock/clear staging/unlock before any attempt to read-
unless you're polling that log table at truly crazy rates, the
overhead should be negligible and will ensure that the staging table
is simply cleared out whenever necessary while removing the
complexity of a separate process.


Using serialization mode for the staging-to-log process seems to be the 
most efficient methods, as it won't even block writers.


- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread David Wilson
On Tue, Apr 22, 2008 at 9:52 AM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:

  Describe the mechanism, because I don't really believe it yet. I think you
 need to do a advisory lock around every commit of every transaction that
 writes to the log table.

Consider some number of reader processes and some number of writer processes.

Writer processes touch only the staging table, and solely do inserts
into it. As a result, writer processes cannot interfere with each
other in any way and do not require any synchronization beyond that
provided by MVCC.

Reader processes are interested in polling the logging table at
intervals. In the process, they also act as staging-to-log movers.
This act (because it is destructive and because we require serialized
inserts for id generation in the log table) must take a lock that
prevents other readers from attempting the same work at the same time.

Each reader process therefore has a loop that appears as follows:
1) Obtain advisory lock.
2) Begin transaction.
3) For each row in staging table, insert copy into log table.
4) Delete all visible rows from staging table.
5) Commit transaction.
6) Release advisory lock.
7) Handle not-yet-seen rows in the logging table (This is the primary
work of the readers)
8) Sleep for the desired interval and return to 1).

We require two types of synchronization and the above takes care of both:
1) The advisory lock prevents multiple readers from doing simultaneous
staging-to-log moves.
2) The transaction block ensures that the reader will see a consistent
state on the staging table while writers may write at the same time;
writes that occur during the reader's transaction block will simply be
ignored during this round of reading.

You need both types of synchronization to avoid problems- taking an
exclusive lock would simply be the sledgehammer method of doing the
synchronization, since it would take the place of both the advisory
lock and the transaction at the same time but would also block
writers.

-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread Joris Dobbelsteen

David Wilson wrote:

On Tue, Apr 22, 2008 at 9:52 AM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:

 Describe the mechanism, because I don't really believe it yet. I think you
need to do a advisory lock around every commit of every transaction that
writes to the log table.


Consider some number of reader processes and some number of writer processes.

Writer processes touch only the staging table, and solely do inserts
into it. As a result, writer processes cannot interfere with each
other in any way and do not require any synchronization beyond that
provided by MVCC.

Reader processes are interested in polling the logging table at
intervals. In the process, they also act as staging-to-log movers.
This act (because it is destructive and because we require serialized
inserts for id generation in the log table) must take a lock that
prevents other readers from attempting the same work at the same time.

Each reader process therefore has a loop that appears as follows:
1) Obtain advisory lock.
2) Begin transaction.
3) For each row in staging table, insert copy into log table.
4) Delete all visible rows from staging table.


Ah, yes, all visible rows...
My point is that, unless you use a transaction with serializable 
isolation, this all visible rows for the second statement might be 
different from those that you copied into the log table.


With the normal Read committed isolation level you suffer from a 
possible nonrepeatable read that might change tuple visibility between 
different statements.



5) Commit transaction.
6) Release advisory lock.
7) Handle not-yet-seen rows in the logging table (This is the primary
work of the readers)
8) Sleep for the desired interval and return to 1).

We require two types of synchronization and the above takes care of both:
1) The advisory lock prevents multiple readers from doing simultaneous
staging-to-log moves.
2) The transaction block ensures that the reader will see a consistent
state on the staging table while writers may write at the same time;
writes that occur during the reader's transaction block will simply be
ignored during this round of reading.


See above, you demand its impossible for nonrepeatable reads to occur.


You need both types of synchronization to avoid problems- taking an
exclusive lock would simply be the sledgehammer method of doing the
synchronization, since it would take the place of both the advisory
lock and the transaction at the same time but would also block
writers.


I agree with you on this, but it does guarentee the impossibility of a 
nonrepeatable read at the cost of concurrency. There seems to be a 
better solution indeed.


- Joris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread David Wilson
On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:

  Ah, yes, all visible rows...
  My point is that, unless you use a transaction with serializable isolation,
 this all visible rows for the second statement might be different from those
 that you copied into the log table.

  With the normal Read committed isolation level you suffer from a possible
 nonrepeatable read that might change tuple visibility between different
 statements.

That depends on implementation. A select into ... to do the initial
copy followed by a delete where... with the where clause referencing
the log table itself to ensure that we delete only things that now
exist in the log table, or a row by row  insert/delete pair. Either
would provide the appropriate level of protection from accidental
deletion of more things than you intended without harming concurrency.
The delete referencing the log table might require that the log table
be indexed for performance, but it's likely that such indexing would
be done anyway for general log use.
-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread Gurjeet Singh
On Wed, Apr 23, 2008 at 12:29 AM, David Wilson [EMAIL PROTECTED]
wrote:

 On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen
 [EMAIL PROTECTED] wrote:
 
   Ah, yes, all visible rows...
   My point is that, unless you use a transaction with serializable
 isolation,
  this all visible rows for the second statement might be different from
 those
  that you copied into the log table.
 
   With the normal Read committed isolation level you suffer from a
 possible
  nonrepeatable read that might change tuple visibility between different
  statements.

 That depends on implementation. A select into ... to do the initial
 copy followed by a delete where... with the where clause referencing
 the log table itself to ensure that we delete only things that now
 exist in the log table, or a row by row  insert/delete pair. Either
 would provide the appropriate level of protection from accidental
 deletion of more things than you intended without harming concurrency.
 The delete referencing the log table might require that the log table
 be indexed for performance, but it's likely that such indexing would
 be done anyway for general log use.


I think this plpgsql function would solve the problem of atomic
read-and-delete operation...

create or replace function log_rotate() returns void as $$
declare
  rec record;
begin

for rec in delete from t1 returning * loop
insert into t2 values( rec.a, rec.b );
end loop;

end;
$$ language 'plpgsql';

select log_rotate();

Best regards,
-- 
[EMAIL PROTECTED]

[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-22 Thread Joris Dobbelsteen

Gurjeet Singh wrote:
On Wed, Apr 23, 2008 at 12:29 AM, David Wilson [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


On Tue, Apr 22, 2008 at 2:48 PM, Joris Dobbelsteen
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
wrote:
 
   Ah, yes, all visible rows...
   My point is that, unless you use a transaction with serializable
isolation,
  this all visible rows for the second statement might be different
from those
  that you copied into the log table.
 
   With the normal Read committed isolation level you suffer from a
possible
  nonrepeatable read that might change tuple visibility between
different
  statements.

That depends on implementation. A select into ... to do the initial
copy followed by a delete where... with the where clause referencing
the log table itself to ensure that we delete only things that now
exist in the log table, or a row by row  insert/delete pair. Either
would provide the appropriate level of protection from accidental
deletion of more things than you intended without harming concurrency.
The delete referencing the log table might require that the log table
be indexed for performance, but it's likely that such indexing would
be done anyway for general log use.


Of course, point is, that is another way to define visibility in this 
context: if present in log table. Point is, a suitable definition is needed.


I think this plpgsql function would solve the problem of atomic 
read-and-delete operation...


create or replace function log_rotate() returns void as $$
declare
  rec record;
begin

for rec in delete from t1 returning * loop
insert into t2 values( rec.a, rec.b );
end loop;

end;
$$ language 'plpgsql';

select log_rotate();


Don't forget ordering, this was important before...

START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT ... INTO log FROM staging ORDER BY ...;
DELETE FROM staging;
COMMIT;

Don't know if that ORDER BY works. It should in this case.

- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-21 Thread Joris Dobbelsteen

Andrew Sullivan wrote:

On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:

It won't work with multiple concurrent writers. There is no guarantee
that an INSERT with a timestamp older than the one you just saw isn't
waiting to commit.


This is pretty unlikely -- I won't say impossible, because I'm sure there's
some kernel-level race condition -- if you use the clock time approach and
SERIALIZABLE mode.


Don't, understand what SERIALIZABLE mode means (in this context):
* It makes restrictions on what you CAN  MAY read (i.e. controls 
visiblity).


If you want to serialize your accesses (which is something completely 
different) you are probably better of with using an exclusive table lock 
on the log table. Otherwise guarantees cannot be made on postgresql. 
Even worse, you MUST NOT use SERIALIZABLE mode if you are going to check 
you log table.


Similarly, handing referential integrity is a complex to handle and has 
special mechanisms in postgresql to handle that.



Going into detail about concurrently running code:
* Without synchronization mechanisms you CANNOT control in which order 
code is executed.



Also, you are not dealing with a race condition here. Its the semantics 
or requirements on the ordering. Race conditions are failures caused by 
a particular ordering (in time) of events, in such a way that corrupts 
the state. There can be prevented by proper locking, but have been 
overlooked (which is easy) and timing constraints are usually very 
stringent so that they are very hard to detect. Therefore, they are 
also, usually, very hard to encounter in common situations.




You could add a trigger that checks for other timestamps
 yours, I suppose.  Of course, that's pretty heavyweight, too.  How much is
the absolute serialization worth to you in performance?


You cannot do this in a reliable way. The Postgresql MVCC semantics are 
such that you can not do that in a reliable manner. The only way is if 
you take at least a write lock on the log table and ensure you are NOT 
in serializable mode.


Remember that you CANNOT view tuples that are not yet committed. In 
fact, with serializable mode you will only see changes from transactions 
that are completed BEFORE you started yours.



The only other thing I can suggest is what someone else did: commit them
with wallclock timestamps, and then have a different thread wake up every
_n_ seconds and put the records into the proper table in timestamp order.


You can guarantee:
* the proper ordering of log events for a single transaction (by 
property of transactions).
* modifications/delete to a single tuple (due to the synchronization 
point caused by the write lock).

This seems enough in the requested case.

You can even do so with a simple sequence counter if you are only 
interested the of ordering in time of events.


- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-21 Thread Joris Dobbelsteen

Craig Ringer wrote:
[snip]
If you really want to make somebody cry, I guess you could do it with 
dblink - connect back to your own database from dblink and use a short 
transaction to commit a log record, using table-based (rather than 
sequence) ID generation to ensure that records were inserted in ID 
order. That'd restrict the critical section in which your various 
transactions were unable to run concurrently to a much shorter period, 
but would result in a log message being saved even if the transaction 
later aborted. It'd also be eye-bleedingly horrible, to the point where 
even the send a message from a C function approach would be nicer.


This will not work for the problem the TS has. Let a single transaction 
hang for a long enough time before commit, while others succeed. It will 
keep ordering of changes, but commits might come unordered.


The issue is, you don't really have the critical section as you 
describe, there is no SINGLE lock you are 'fighting' for.


It will work with an added table write lock (or up), that will be the 
lock for your critical section.


In my opinion I would just forget about this one rather quickly as you 
more or less proposed...


- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-21 Thread Vance Maverick
Thanks to all for your help.  I've adopted the scheme involving a
staging table -- the writer processes insert into that, then a single
publisher process pulls from that and writes to the log, giving a
clean serial order for any reader of the log.

Vance 

On Mon, 2008-04-21 at 23:59 +0200, Joris Dobbelsteen wrote:
 Craig Ringer wrote:
 [snip]
  If you really want to make somebody cry, I guess you could do it with 
  dblink - connect back to your own database from dblink and use a short 
  transaction to commit a log record, using table-based (rather than 
  sequence) ID generation to ensure that records were inserted in ID 
  order. That'd restrict the critical section in which your various 
  transactions were unable to run concurrently to a much shorter period, 
  but would result in a log message being saved even if the transaction 
  later aborted. It'd also be eye-bleedingly horrible, to the point where 
  even the send a message from a C function approach would be nicer.
 
 This will not work for the problem the TS has. Let a single transaction 
 hang for a long enough time before commit, while others succeed. It will 
 keep ordering of changes, but commits might come unordered.
 
 The issue is, you don't really have the critical section as you 
 describe, there is no SINGLE lock you are 'fighting' for.
 
 It will work with an added table write lock (or up), that will be the 
 lock for your critical section.
 
 In my opinion I would just forget about this one rather quickly as you 
 more or less proposed...
 
 - Joris

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-21 Thread Joris Dobbelsteen

David Wilson wrote:

(I originally missed replying to all here; sorry about the duplicate,
Vance, but figured others might be interested.

On Wed, Apr 16, 2008 at 1:55 PM, Vance Maverick [EMAIL PROTECTED] wrote:

 Another approach would be to queue the log entries in a staging table,
 so that a single process could move them into the log.  This is fairly
 heavyweight, but it would guarantee the consistent sequencing of the log
 as seen by a reader (even if the order of entries in the log didn't
 always reflect the true commit sequence in the staging table).  I'm
 hoping someone knows a cleverer trick.



Consider a loop like the following

advisory lock staging table
if (entries in table)
   copy entries to main log table as a single transaction
release advisory lock on staging table
read out and handle most recent log entries from main table

The advisory lock is automatically released on client disconnect, and
doing the whole thing within one transaction should prevent any
partial-copies on failures.

It doesn't matter that there are concurrent inserts to the staging
table because the staging table is always wiped all at once and
transferred in a synchronous fashion to the main table. You also can't
lose data, because it's always in one of the two tables.


If you want to clean up the the staging table I have some concerns about 
the advisory lock. I think you mean exclusive table lock.


There are other two options as well:

* Track which data is copies and remove those from the staging table 
that are in the new table.


* Use a serializable mode for the staging-to-log-copying transactions.
In this way you can just copy the table and trow away everything 
(without checking). This seems rather cheap and allows for concurrent 
processing.


- Joris

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-21 Thread David Wilson
On Mon, Apr 21, 2008 at 7:55 PM, Joris Dobbelsteen
[EMAIL PROTECTED] wrote:

  If you want to clean up the the staging table I have some concerns about
 the advisory lock. I think you mean exclusive table lock.

Either works, really. An advisory lock is really just a lock over
which you have control of the meaning, as long as you're using it in
the appropriate places. Also, an advisory lock on just the processes
doing staging-to-log moves would allow writes into the staging table
to continue concurrently with the staging-to-log transaction (whereas
an exclusive lock would unnecessarily prevent them).

Also, while Vance appears to have chosen to have a dedicated
staging-to-log process, even that isn't necessary- each reader can
simply do the lock/clear staging/unlock before any attempt to read-
unless you're polling that log table at truly crazy rates, the
overhead should be negligible and will ensure that the staging table
is simply cleared out whenever necessary while removing the
complexity of a separate process.


-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-17 Thread Chris Browne
[EMAIL PROTECTED] (Andrew Sullivan) writes:
 Oh, one other thing

 On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
  
  One way I can think of doing it is to write a seen_log that notes what the
  client has already seen with a timestamp of (say) 1 minute.  Then you can
  say go forward from this time excluding ids (ids here).
 
 It won't work with multiple concurrent writers. There is no guarantee
 that an INSERT with a timestamp older than the one you just saw isn't
 waiting to commit.

 Yeah, I spoke imprecisely.  The idea is, Start at timestamp _t_, but don't
 re-process these ones, which I've seen.  The trick is to set your start _t_
 far enough back in time that it's incompatible with your business logic that
 anything could still be pending from then.  This is nasty and prone to bugs,
 but it can be coded up.

The alternative pattern is to put a trigger on the table that collects
the primary key ID in a queue table.

If some updates were still in flight when the queue processor runs,
then it can catch them next time.

Curiously enough, I have had *two* meetings today where this was germaine ;-).
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/wp.html
Do not worry  about the bullet that  has got your name on  it. It will
hit you and it will kill  you, no questions asked. The rounds to worry
about are the ones marked: TO WHOM IT MAY CONCERN.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Vance Maverick
I want to create a table with the semantics of a log.  There may be
multiple concurrent writers, because log entries will be generated by
triggers.  And there will be multiple concurrent readers -- mostly
remote processes watching the log over time.  I'd like to guarantee that
each of those readers will see the same, complete sequence of changes.

For a first attempt, I created a table with a serial column (actually I
created the sequence ID separately, but that's a detail).  The readers
connect remotely, scan through the table in sequence order, and remember
the ID of the last row they read.  When they read again, they start from
after that ID -- the query is roughly

   SELECT * FROM logtable WHERE id  ? ORDER BY id

But there's a problem with this -- the rows may not be inserted into the
log in ID order.  For example, if two concurrent sessions get the
sequence IDs 3 and 4, the one with ID 4 might commit first.  If a reader
queries the table at that moment, it will see the sequence (1, 2, 4).
Later, if the other session commits, a log entry with ID 3 will be
added.  The IDs will now be (1, 2, 4, 3) -- but if the reader comes back
to poll the log again, it will miss entry 3.  In order not to miss it,
the reader would have to remember much more information than just a
high-water mark.  (I think it would have to remember the complete set of
IDs it had processed already, and then scan the whole table to find
entries not in the set.)

Is there an approach that will give the semantics I need?  In
particular, I'd like a reader that has already read up to a certain
point in the log to be able to restart at or near that same point, so it
wouldn't have to reread everything.

(I recognize that these log semantics are closer to those provided by
files, not database tables.  A file is an option I'll consider, but
obviously it lacks transactional semantics.)

Thanks,

Vance



Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread brian

Vance Maverick wrote:

I want to create a table with the semantics of a log.  There may be
multiple concurrent writers, because log entries will be generated by
triggers.  And there will be multiple concurrent readers -- mostly
remote processes watching the log over time.  I'd like to guarantee that
each of those readers will see the same, complete sequence of changes.

For a first attempt, I created a table with a serial column (actually I
created the sequence ID separately, but that's a detail).  The readers
connect remotely, scan through the table in sequence order, and remember
the ID of the last row they read.  When they read again, they start from
after that ID -- the query is roughly

   SELECT * FROM logtable WHERE id  ? ORDER BY id

But there's a problem with this -- the rows may not be inserted into the
log in ID order.  For example, if two concurrent sessions get the
sequence IDs 3 and 4, the one with ID 4 might commit first.  If a reader
queries the table at that moment, it will see the sequence (1, 2, 4).
Later, if the other session commits, a log entry with ID 3 will be
added.  The IDs will now be (1, 2, 4, 3) -- but if the reader comes back
to poll the log again, it will miss entry 3.  In order not to miss it,
the reader would have to remember much more information than just a
high-water mark.  (I think it would have to remember the complete set of
IDs it had processed already, and then scan the whole table to find
entries not in the set.)

Is there an approach that will give the semantics I need?  In
particular, I'd like a reader that has already read up to a certain
point in the log to be able to restart at or near that same point, so it
wouldn't have to reread everything.

(I recognize that these log semantics are closer to those provided by
files, not database tables.  A file is an option I'll consider, but
obviously it lacks transactional semantics.)



Use a timestamp column also.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

brian wrote:


Use a timestamp column also.


That's subject to the same issues, because a transaction's 
current_timestamp() is determined at transaction start. So, in a 
situation like this:



WRITER 1WRITER 2READER 1

BEGIN
BEGIN
INSERT
INSERT
COMMIT
BEGIN
SELECT
COMMIT

then READER 1 will see the most recent timestamp as that inserted by 
WRITER 2, but it won't see the row inserted by WRITER 1 with an earlier 
timestamp.


I don't think it's even OK in the case of a single-statement INSERT 
(where the transaction is implicit) and/or with the use of 
clock_timestamp() ... though I'm less sure about that.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Vance Maverick
Craig Ringer craig(at)postnewspapers(dot)com(dot)au wrote:
 brian wrote:

  Use a timestamp column also.

 That's subject to the same issues.
 [...]
 I don't think it's even OK in the case of a single-statement INSERT
(where the
 transaction is implicit) and/or with the use of clock_timestamp() ...
though
 I'm less sure about that.

No, you're right.  The problem is that the timestamp is chosen some time
before the commit succeeds.  So if there are concurrent writers
committing at the same time, the order of commit is determined by a
race, one that takes place after the timestamp values are set.

Another approach would be to queue the log entries in a staging table,
so that a single process could move them into the log.  This is fairly
heavyweight, but it would guarantee the consistent sequencing of the log
as seen by a reader (even if the order of entries in the log didn't
always reflect the true commit sequence in the staging table).  I'm
hoping someone knows a cleverer trick.

Vance



Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

Vance Maverick wrote:


Another approach would be to queue the log entries in a staging table,
so that a single process could move them into the log.  This is fairly
heavyweight, but it would guarantee the consistent sequencing of the log
as seen by a reader (even if the order of entries in the log didn't
always reflect the true commit sequence in the staging table).


The way I see it, one way or another you are going to have to serialize 
writers, otherwise you'll always be faced with commit-order races.


I do have one other idea, but it's not pretty. Write a C/Python/whatever 
procedural function, say send_log_record(), that uses an appropriate 
inter-process communication mechanism to send the log message to another 
process. Your writers use this function to generate log records. The 
outside process receiving log records has a single connection to the DB 
open and it is the only writer to the log table, thus avoiding the 
problems with commit races with multiple writers. Your C function is 
bypassing transactional isolation by communicating with another process 
that modifies the DB, and in the process eliminating the need to hold 
the whole transaction up to ensure predictable log write ordering. 
However, it *does* mean that you'll get a log entry even if the 
transaction then aborts. You might be able to get around that by doing 
your logging with a deferred trigger, but there's always a risk that a 
later deferred trigger will fail and abort the transaction.


Doing it via a staging table is a *lot* nicer, and a lot lighter weight, 
than having your logging code force serialization of all operations in 
transactions that could otherwise, other than the log ordering 
requirement, run concurrently. Say, using an id generation table that 
each transaction locks to ensure ordered ID generation and commits. It's 
also properly transactional, so you won't have any log records for 
aborted transactions.


It's a pity PostgreSQL's RETURNING extension appear to doesn't support
INSERT INTO ... DELETE FROM ... RETURNING
because that'd make your log record mover a rather efficient one-liner.


The only other alternative I can think of is to have multiple writers 
inserting records into the same table the readers are reading from, but 
have the writers insert records with a timestamp field (say `visible') 
set to null.


A single helper (there must only ever be one) can then repeatedly run a 
command sequence like:


BEGIN;
UPDATE logtable
   SET visible = current_timestamp
 WHERE visible IS NULL
COMMIT;

to ensure that records became visible in timestamp order. No race here; 
like in the staging table approach you're using a single write 
transaction on the table being used by the readers.


Readers would filter for records that have `visible  
last_seen_visible', where last_seen_visible would be a literal, being 
the greatest value of `visible' seen in the last query for log records. 
They could trust that no records could ever be missed.


Unfortunately, with this approach you're incurring the cost of a dead 
row for every UPDATE. You can avoid that with 8.3 (using HOT) only if 
you have no index on `visible' - but having no index means a sequential 
scan of the log table for every UPDATE making rows visible and for every 
SELECT looking for the latest rows. Ouch.


That's basically just another way to write your log staging approach, 
anyway. It stays within a single table but it's not otherwise much 
different. I haven't any idea whether it'd perform better or worse than 
using a separate log staging table.



If you really want to make somebody cry, I guess you could do it with 
dblink - connect back to your own database from dblink and use a short 
transaction to commit a log record, using table-based (rather than 
sequence) ID generation to ensure that records were inserted in ID 
order. That'd restrict the critical section in which your various 
transactions were unable to run concurrently to a much shorter period, 
but would result in a log message being saved even if the transaction 
later aborted. It'd also be eye-bleedingly horrible, to the point where 
even the send a message from a C function approach would be nicer.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread brian

Craig Ringer wrote:

brian wrote:


Use a timestamp column also.


That's subject to the same issues, because a transaction's 
current_timestamp() is determined at transaction start. So, in a 
situation like this:



WRITER 1WRITER 2READER 1

BEGIN
BEGIN
INSERT
INSERT
COMMIT
BEGIN
SELECT
COMMIT

then READER 1 will see the most recent timestamp as that inserted by 
WRITER 2, but it won't see the row inserted by WRITER 1 with an earlier 
timestamp.


I don't think it's even OK in the case of a single-statement INSERT 
(where the transaction is implicit) and/or with the use of 
clock_timestamp() ... though I'm less sure about that.




I don't mean to rely on *only* the timestamp, but for the reader to 
remember both the last ID and the timestamp for that particular 
transaction. When the next read occurs it should check to see if there's 
an earlier timestamp with a higher ID than that remembered. The database 
knows that WRITER 1 was there first. If it's important to the 
application then the reader will need to take some action to re-order 
things based on what it has already read, which it could do if it if it 
compared timestamps and ID order. It needn't keep the complete set of 
IDs in memory.


Wait--would WRITER 1 have the higher ID?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread David Wilson
(I originally missed replying to all here; sorry about the duplicate,
Vance, but figured others might be interested.

On Wed, Apr 16, 2008 at 1:55 PM, Vance Maverick [EMAIL PROTECTED] wrote:

  Another approach would be to queue the log entries in a staging table,
  so that a single process could move them into the log.  This is fairly
  heavyweight, but it would guarantee the consistent sequencing of the log
  as seen by a reader (even if the order of entries in the log didn't
  always reflect the true commit sequence in the staging table).  I'm
  hoping someone knows a cleverer trick.


Consider a loop like the following

advisory lock staging table
if (entries in table)
   copy entries to main log table as a single transaction
release advisory lock on staging table
read out and handle most recent log entries from main table

The advisory lock is automatically released on client disconnect, and
doing the whole thing within one transaction should prevent any
partial-copies on failures.

It doesn't matter that there are concurrent inserts to the staging
table because the staging table is always wiped all at once and
transferred in a synchronous fashion to the main table. You also can't
lose data, because it's always in one of the two tables.

-- 
- David T. Wilson
[EMAIL PROTECTED]

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer

brian wrote:

I don't mean to rely on *only* the timestamp, but for the reader to 
remember both the last ID and the timestamp for that particular 
transaction. When the next read occurs it should check to see if there's 
an earlier timestamp with a higher ID than that remembered.


[snip]


Wait--would WRITER 1 have the higher ID?


No, it'll have a lower id in this case because it calls 
nextval('sequence_name') first. Writer 1 would have a lower id, and a 
lower timestamp (because its transaction began first) even if it 
committed later. Using clock_timestamp() in the insert will not help, 
because the first transaction to insert (as in this case) is not 
necessarily the first to commit.


If a reader sees a given id and timestamp, that doesn't meant there 
aren't transactions with lower ids, and lower timestamps, still 
uncomitted or in the process of committing.


What you want is a timestamp that's generated at commit time with a 
guarantee that no later commits will have equal or lower timestamps . As 
far as I know (I'm *VERY* far from the authority here) there's no way to 
achieve that, so you have to serialize your commits to the table somehow.


--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote:
 That's subject to the same issues, because a transaction's 
 current_timestamp() is determined at transaction start. 

But clock_timestamp() (and its ancestors in Postgres) don't have that
restriction.  I dunno that it's enough for you, though, since you have
visibility issues as well.  You seem to want both the benefits of files and
relational database transactions, and I don't think you can really have both
at once without paying in reader complication.

One way I can think of doing it is to write a seen_log that notes what the
client has already seen with a timestamp of (say) 1 minute.  Then you can
say go forward from this time excluding ids (ids here).

A


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Craig Ringer
Andrew Sullivan wrote:
 On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote:
 That's subject to the same issues, because a transaction's 
 current_timestamp() is determined at transaction start. 
 
 But clock_timestamp() (and its ancestors in Postgres) don't have that
 restriction.

True, as I noted later, but it doesn't help. AFAIK you can't guarantee
that multiple concurrent INSERTs will be committed in the same order
that their clock_timestamp() calls were evaluated.

Consequently, you can still have a situation where a record with a lower
timestamp becomes visible to readers after a record with a higher
timestamp has, and after the reader has already recorded the higher
timestamp as their cutoff.

 I dunno that it's enough for you, though, since you have
 visibility issues as well.  You seem to want both the benefits of files and
 relational database transactions, and I don't think you can really have both
 at once without paying in reader complication.

Or writer complication. In the end, the idea that using a file based log
wouldn't have this problem is based on the implicit assumption that the
file based logging mechanism would provide some sort of serialization of
writes.

As POSIX requires the write() call to be thread safe, write() would be
doing its own internal locking (or doing clever lock-free queueing etc)
to ensure writes are serialized.

However, at least in Linux fairly recently, writes aren't serialized, so
you have to do it yourself. See:

http://lwn.net/Articles/180387/

In any case, logging to a file with some sort of writer serialization
isn't significantly different to logging to a database table outside
your transaction using some sort of writer serialization. Both
mechanisms must serialize writers to work. Both mechanisms must operate
outside the transactional rules of the transaction invoking the logging
operation in order to avoid serializing all operations in transactions
that write to the log on the log.

 One way I can think of doing it is to write a seen_log that notes what the
 client has already seen with a timestamp of (say) 1 minute.  Then you can
 say go forward from this time excluding ids (ids here).

It won't work with multiple concurrent writers. There is no guarantee
that an INSERT with a timestamp older than the one you just saw isn't
waiting to commit.

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
 It won't work with multiple concurrent writers. There is no guarantee
 that an INSERT with a timestamp older than the one you just saw isn't
 waiting to commit.

This is pretty unlikely -- I won't say impossible, because I'm sure there's
some kernel-level race condition -- if you use the clock time approach and
SERIALIZABLE mode.  You could add a trigger that checks for other timestamps
 yours, I suppose.  Of course, that's pretty heavyweight, too.  How much is
the absolute serialization worth to you in performance?

The only other thing I can suggest is what someone else did: commit them
with wallclock timestamps, and then have a different thread wake up every
_n_ seconds and put the records into the proper table in timestamp order.

A

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] table as log (multiple writers and readers)

2008-04-16 Thread Andrew Sullivan
Oh, one other thing

On Thu, Apr 17, 2008 at 12:44:51PM +0800, Craig Ringer wrote:
 
  One way I can think of doing it is to write a seen_log that notes what the
  client has already seen with a timestamp of (say) 1 minute.  Then you can
  say go forward from this time excluding ids (ids here).
 
 It won't work with multiple concurrent writers. There is no guarantee
 that an INSERT with a timestamp older than the one you just saw isn't
 waiting to commit.

Yeah, I spoke imprecisely.  The idea is, Start at timestamp _t_, but don't
re-process these ones, which I've seen.  The trick is to set your start _t_
far enough back in time that it's incompatible with your business logic that
anything could still be pending from then.  This is nasty and prone to bugs,
but it can be coded up.

A

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general