subject:"Re\\\\\\\: \\\\\\\[HACKERS\\\\\\\] \\\\Hot standby\\\\\\\\\\\?"

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-26 Thread Simon Riggs


On Sun, 2009-01-25 at 19:56 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> >> 2. Kill all connections by given user. Hmm, not used for anything, 
> >> actually. Should remove the roleId argument from GetConflictingVirtualXIDs.
> > 
> > No, because we still need to add code to kill-connected-users if we drop
> > role.
> 
> Oh, I see, that's still a todo item. But we don't do that during normal 
> operation, so why should we during recovery?

LOL. Not many systems allow you to continue working after the access has
been removed. But not, as you say, a problem for Hot Standby unless that
changes.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-25 Thread Heikki Linnakangas


Simon Riggs wrote:
2. Kill all connections by given user. Hmm, not used for anything, 
actually. Should remove the roleId argument from GetConflictingVirtualXIDs.


No, because we still need to add code to kill-connected-users if we drop
role.


Oh, I see, that's still a todo item. But we don't do that during normal 
operation, so why should we during recovery?



 I'm thinking of an interface
consisting of three functions, replacing the current 
GetConflictingVirtualXIDs and ResolveRecoveryConflictWithVirtualXIDs 
functions:


I'm not sure I see any benefit in doing that.


It makes the code more readable.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, conflict resolution

2009-01-25 Thread Simon Riggs


On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

> Ok, then I think we have a little race condition. The startup process 
> doesn't get any reply indicating that the target backend has
> processed 
> the SIGINT and set the cached conflict LSN. The target backend might 
> move ahead using the old LSN for a little while, even though the
> startup 
> process has already gone ahead and replayed a vacuum record.
> 
> Another tiny issue is that it looks like a new conflict LSN always 
> overwrites the old one. But you should always use the oldest
> conflicted 
> LSN in the checks, not the newest.

That makes it easier, because it is either not set, or it is set and
does not need to be reset as new conflict LSNs appear.

I can see a simple scheme emerging, which I will detail tomorrow
morning.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-25 Thread Simon Riggs

On Sun, 2009-01-25 at 11:28 +0200, Heikki Linnakangas wrote:

> >> You then call 
> >> ResolveRecoveryConflictWithVirtualXIDs to kill such transactions, and 
> >> try removing the directory again. But 
> >> ResolveRecoveryConflictWithVirtualXIDs doesn't wait for the target 
> >> transaction to die anymore (or at least it shouldn't, as we discussed 
> >> earlier), so that doesn't work AFAICS.
> > 
> > The FATAL errors inflicted should be fairly quick to take effect, so
> > waiting should not be a problem. We can make waiting for FATAL errors
> > the standard response.
> 
> The call in tablespace replay uses ERROR, not FATAL. You don't need to 
> kill the whole session, just the current transaction.

Yeh, yeh, it already does that, FATAL was just my typo. My bad.

> It seems more and more to me that the FATAL and ERROR cases are really 
> quite different. Looking at the callers, there's four different needs 
> for GetConflictingVirtualXIDs+ResolveRecoveryConflictWithVirtualXIDs:

I think the deferred case should be handled with a different function,
since it needs further work on it and that is best done as a second
function.

> 1. Kill all connections to a given database. Used when replaying DROP 
> DATABASE.
> 
> 2. Kill all connections by given user. Hmm, not used for anything, 
> actually. Should remove the roleId argument from GetConflictingVirtualXIDs.

No, because we still need to add code to kill-connected-users if we drop
role.

> 3. Kill all transactions using given tablespace as temp tablespace, FROP 
> TABLESPACE.
> 
> 4. Mark all transactions that still see a given XID as running for 
> termination if they try to access a buffer with conflicting LSN (VACUUM, 
> btree-deletes).
> 
> All callers call GetConflictingVirtualXIDs first, and then 
> ResolveRecoveryConflictWithVirtualXIDs. That's a bit cumbersome; none of 
> the callers do anything else with the virtualxid array they get from 
> GetConflictingVirtualXIDs than pass it on to 
> ResolveRecoveryConflictWithVirtualXIDs. I'm thinking of an interface 
> consisting of three functions, replacing the current 
> GetConflictingVirtualXIDs and ResolveRecoveryConflictWithVirtualXIDs 
> functions:

I'm not sure I see any benefit in doing that.

The two big changes we did earlier were worthwhile but I'm concerned
about how much refactoring you want to do. It introduces new bugs each
time and currently they take time to isolate and fix - much of that work
is currently not very visible, but its a big effort. If I was happy we
had a perfect working solution and we were not under time pressure then
I'd say go for it as much as you like. I'd prefer it if we could get
everything correct before we put all the code in the right cupboards. I
know tidy-up-as-you-go is a good policy but I'd encourage you to do a
first pass looking for potential problems before we did that. Or maybe
we're there already, not sure.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-25 Thread Grzegorz Jaskiewicz



On 2009-01-25, at 09:04, Simon Riggs wrote:



On Sat, 2009-01-24 at 21:58 +0200, Heikki Linnakangas wrote:

When replaying a DROP TABLE SPACE, you first try to remove the
directory, and if that fails, you assume that it's because it's in  
use

as a temp tablespace in a read-only transaction.


That sounds like you think there another conclusion?


What if subdirectory of that directory is owned by root ?
Say I create /home/gj/tablespace1 . Surely emptying it is possible,  
but should it remove the dir altogether ?
It is possible to create tablespace in that directory, even so  
postgres user doesn't own /home/gj/ directory. So why shouldn't it be  
possible to drop it ?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-25 Thread Heikki Linnakangas


Simon Riggs wrote:

On Sat, 2009-01-24 at 21:58 +0200, Heikki Linnakangas wrote:
When replaying a DROP TABLE SPACE, you first try to remove the 
directory, and if that fails, you assume that it's because it's in use 
as a temp tablespace in a read-only transaction. 


That sounds like you think there another conclusion?


No, I think that assumption is correct.

You then call 
ResolveRecoveryConflictWithVirtualXIDs to kill such transactions, and 
try removing the directory again. But 
ResolveRecoveryConflictWithVirtualXIDs doesn't wait for the target 
transaction to die anymore (or at least it shouldn't, as we discussed 
earlier), so that doesn't work AFAICS.


The FATAL errors inflicted should be fairly quick to take effect, so
waiting should not be a problem. We can make waiting for FATAL errors
the standard response.


The call in tablespace replay uses ERROR, not FATAL. You don't need to 
kill the whole session, just the current transaction.


It seems more and more to me that the FATAL and ERROR cases are really 
quite different. Looking at the callers, there's four different needs 
for GetConflictingVirtualXIDs+ResolveRecoveryConflictWithVirtualXIDs:


1. Kill all connections to a given database. Used when replaying DROP 
DATABASE.


2. Kill all connections by given user. Hmm, not used for anything, 
actually. Should remove the roleId argument from GetConflictingVirtualXIDs.


3. Kill all transactions using given tablespace as temp tablespace, FROP 
TABLESPACE.


4. Mark all transactions that still see a given XID as running for 
termination if they try to access a buffer with conflicting LSN (VACUUM, 
btree-deletes).


All callers call GetConflictingVirtualXIDs first, and then 
ResolveRecoveryConflictWithVirtualXIDs. That's a bit cumbersome; none of 
the callers do anything else with the virtualxid array they get from 
GetConflictingVirtualXIDs than pass it on to 
ResolveRecoveryConflictWithVirtualXIDs. I'm thinking of an interface 
consisting of three functions, replacing the current 
GetConflictingVirtualXIDs and ResolveRecoveryConflictWithVirtualXIDs 
functions:


/* Kill all connections to given database, wait for them to die */
void KillConnectionsToDatabase(Oid dbOid)

/* Kill all transactions that use given tablespace, wait for them to die */
void KillTransactionsUsingTablespace(Oid tablespaceOid)

/* Signal all backends that with xmin < limitXid that they need to abort 
if they access a page with LSN >= conflict_lsn. (Or wait for them to 
end, if max_standby_delay > 0). */
void ResolveRecoveryConflicts(Oid dbOid, TransactionId limitXid, 
XLogRecPtr conflict_lsn, char *reason)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, dropping a tablespace

2009-01-25 Thread Simon Riggs


On Sat, 2009-01-24 at 21:58 +0200, Heikki Linnakangas wrote:
> When replaying a DROP TABLE SPACE, you first try to remove the 
> directory, and if that fails, you assume that it's because it's in use 
> as a temp tablespace in a read-only transaction. 

That sounds like you think there another conclusion?

> You then call 
> ResolveRecoveryConflictWithVirtualXIDs to kill such transactions, and 
> try removing the directory again. But 
> ResolveRecoveryConflictWithVirtualXIDs doesn't wait for the target 
> transaction to die anymore (or at least it shouldn't, as we discussed 
> earlier), so that doesn't work AFAICS.

The FATAL errors inflicted should be fairly quick to take effect, so
waiting should not be a problem. We can make waiting for FATAL errors
the standard response.

> One quick work around would be to simply not respect temp_tablespace 
> during recovery...

No, don't think that's acceptable.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Mark Kirkwood


Simon Riggs wrote:

On Sat, 2009-01-24 at 11:20 +, Simon Riggs wrote:
  

On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:



version 9g - please use this for testing now

I'm doing some test runs with this now. I notice an old flatfiles 
related bug has reappeared:
  

I'm seeing an off-by-one error on xmax, in some cases. That then causes
the flat file update to not pick up correct info, even though it
executed in other ways as intended. If you run two create databases and
then test only the first, it appears to have worked as intended.

These bugs are result of recent refactoring and it will take a few days
to shake some of them out. We've had more than 20 already so we're
beating them back, but we're not done yet. 



I was at a loss to explain how this could have slipped through our
tests. It appears that the error was corrected following each checkpoint
as a result of ProcArrayUpdateRunningXacts(). Our tests were performed
after a short delay, which typically would be greater than the
deliberately short setting of checkpoint_timeout/archive_timeout and so
by the time we looked the error was gone and masked the problem. We're
setting checkpoint_timeout to 30 mins now to avoid the delay...

  
That makes sense - I had archive_timeout set at 5 minutes, and I would 
have checked before 5 minutes were up.


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Simon Riggs

On Sat, 2009-01-24 at 11:20 +, Simon Riggs wrote:
> On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:
> 
> > > version 9g - please use this for testing now
> 
> > I'm doing some test runs with this now. I notice an old flatfiles 
> > related bug has reappeared:
> 
> I'm seeing an off-by-one error on xmax, in some cases. That then causes
> the flat file update to not pick up correct info, even though it
> executed in other ways as intended. If you run two create databases and
> then test only the first, it appears to have worked as intended.
> 
> These bugs are result of recent refactoring and it will take a few days
> to shake some of them out. We've had more than 20 already so we're
> beating them back, but we're not done yet. 

I was at a loss to explain how this could have slipped through our
tests. It appears that the error was corrected following each checkpoint
as a result of ProcArrayUpdateRunningXacts(). Our tests were performed
after a short delay, which typically would be greater than the
deliberately short setting of checkpoint_timeout/archive_timeout and so
by the time we looked the error was gone and masked the problem. We're
setting checkpoint_timeout to 30 mins now to avoid the delay...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-24 Thread Simon Riggs

On Sat, 2009-01-24 at 17:24 +1300, Mark Kirkwood wrote:

> > version 9g - please use this for testing now

> I'm doing some test runs with this now. I notice an old flatfiles 
> related bug has reappeared:

I'm seeing an off-by-one error on xmax, in some cases. That then causes
the flat file update to not pick up correct info, even though it
executed in other ways as intended. If you run two create databases and
then test only the first, it appears to have worked as intended.

These bugs are result of recent refactoring and it will take a few days
to shake some of them out. We've had more than 20 already so we're
beating them back, but we're not done yet. 

Thanks for the report, will publish this and other fixes on Monday.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Mark Kirkwood


Simon Riggs wrote:

On Thu, 2009-01-22 at 22:35 +, Simon Riggs wrote:

  

* Bug fix v9 over next few days



version 9g - please use this for testing now

  


I'm doing some test runs with this now. I notice an old flatfiles 
related bug has reappeared:


master:

=# create database test;

slave:

=# select datname from pg_database where datname like 'test';
datname
-
test
(1 row)

postgres=# \c test
FATAL:  database "test" does not exist
Previous connection kept


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Jeff Davis

On Fri, 2009-01-23 at 20:13 +, Greg Stark wrote:
> > If you have a serializable transaction with subtransactions that  
> > suffers
> > a serializability error it only cancels the current subtransaction.
> 
> This is a complete tangent to your work, but I wonder if this is  
> really right. I mean it's not as if we could have srrialized the  
> transaction as a whole with respect to whatever other transaction we  
> failed.

Isn't this back to the discussion about PostgreSQL serializability
versus true serializability?

I agree that's not true serializability.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Greg Stark







If you have a serializable transaction with subtransactions that  
suffers

a serializability error it only cancels the current subtransaction.


This is a complete tangent to your work, but I wonder if this is  
really right. I mean it's not as if we could have srrialized the  
transaction as a whole with respect to whatever other transaction we  
failed.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Simon Riggs


On Fri, 2009-01-23 at 21:30 +0200, Heikki Linnakangas wrote:

> >> Correct me if I'm wrong, but I thought the idea of this new conflict 
> >> resolution was that the startup process doesn't need to wait for the 
> >> target backend to die. Instead, the target backend knows to commit 
> >> suicide if it stumbles into a buffer that's been modified in a 
> >> conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
> >> looks like we still wait.
> > 
> > err, no, that's just an oversight, not intentional.
> 
> Ok, then I think we have a little race condition. The startup process 
> doesn't get any reply indicating that the target backend has processed 
> the SIGINT and set the cached conflict LSN. The target backend might 
> move ahead using the old LSN for a little while, even though the startup 
> process has already gone ahead and replayed a vacuum record.

Hah! You had me either way. Very neat :-)

I'll think some more and reply, but not tonight.

> Another tiny issue is that it looks like a new conflict LSN always 
> overwrites the old one. But you should always use the oldest conflicted 
> LSN in the checks, not the newest.

OK

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> If you have a serializable transaction with subtransactions that suffers
>> a serializability error it only cancels the current subtransaction. That
>> means it's snapshot is still valid and can be used again. By analogy, as
>> long as a transaction does not see any data that is inconsistent with
>> its snapshot it seems OK for it to continue. So I think it is correct.
>
> Yeah, you're right. How bizarre.

It was argued this way to me way back when subtransactions were written.
Originally I had written it in such a way as to abort the whole
transaction, on the rationale that if you're blindly restarting the
subtransaction after a serialization error, it would result in a (maybe
infinite) loop.  I think the reason it only aborts the subxact is that
that's what all other errors do, so why would this one act differently.

Hmm, now that I think about it, I think it was deadlock errors not
serialization errors ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:
ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
there would be no need for the signaling (in the ERROR mode).


That is possible and I considered it. If we did it that way we would
need to read the PGPROC each time we read a buffer. AFAICS we would need
to use a spinlock to do that since reading an XLogRecPtr would not be
atomic.


Hmm, I think you could make it lock-free by ordering the instructions 
carefully. Not sure it's worth the code complexity, though.


Correct me if I'm wrong, but I thought the idea of this new conflict 
resolution was that the startup process doesn't need to wait for the 
target backend to die. Instead, the target backend knows to commit 
suicide if it stumbles into a buffer that's been modified in a 
conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
looks like we still wait.


err, no, that's just an oversight, not intentional.


Ok, then I think we have a little race condition. The startup process 
doesn't get any reply indicating that the target backend has processed 
the SIGINT and set the cached conflict LSN. The target backend might 
move ahead using the old LSN for a little while, even though the startup 
process has already gone ahead and replayed a vacuum record.


Another tiny issue is that it looks like a new conflict LSN always 
overwrites the old one. But you should always use the oldest conflicted 
LSN in the checks, not the newest.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas


Simon Riggs wrote:

If you have a serializable transaction with subtransactions that suffers
a serializability error it only cancels the current subtransaction. That
means it's snapshot is still valid and can be used again. By analogy, as
long as a transaction does not see any data that is inconsistent with
its snapshot it seems OK for it to continue. So I think it is correct.


Yeah, you're right. How bizarre.


(I was sorely tempted to make it "snapshot too old", as a joke).


Yeah, that is a very describing message, actually. If there wasn't any 
precedence to that, I believe we might have came up with exactly that 
message ourselves.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Simon Riggs

On Fri, 2009-01-23 at 18:22 +0200, Heikki Linnakangas wrote:
> > @@ -1601,6 +1602,24 @@ BufferProcessRecoveryConflictsIfAny(volatile 
> > BufferDesc *bufHdr)
> > {
> > XLogRecPtr  bufLSN = BufferGetLSN(bufHdr);
> >  
> > +   /*
> > +* If the buffer is recent we may need to cancel ourselves
> > +* rather than risk returning a wrong answer. This test is
> > +* too conservative, but it is correct.
> > +*
> >>> +* We only need to cancel the current subtransaction.
> > +* Once we've handled the error then other subtransactions 
> > can
> > +* continue processing. Note that we do *not* reset the
> > +* BufferRecoveryConflictLSN at subcommit/abort, but we do
> > +* reset it if we release our last remaining sbapshot.
> > +* see SnapshotResetXmin()
> > +*
> 
> Is it really enough to cancel just the current subtransaction? What if 
> it's a serializable transaction?

I did originally think that when I first looked at the problem. I'm
sorry if I say that a lot. 

If you have a serializable transaction with subtransactions that suffers
a serializability error it only cancels the current subtransaction. That
means it's snapshot is still valid and can be used again. By analogy, as
long as a transaction does not see any data that is inconsistent with
its snapshot it seems OK for it to continue. So I think it is correct.

(Bizarrely, this might mean that if we did this programatically in a
loop we might keep the system busy for some time while it continually
re-reads data and fails. But that's another story).

You remind me that we can now do what Kevin has requested and throw a
errcode(ERRCODE_T_R_SERIALIZATION_FAILURE) at this point, which I agree
is the most easily understood way of describing this error.

(I was sorely tempted to make it "snapshot too old", as a joke).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, conflict resolution

2009-01-23 Thread Simon Riggs


On Fri, 2009-01-23 at 17:51 +0200, Heikki Linnakangas wrote:

> In ERROR mode, you don't really want to interrupt the target backend. In 
> ReadBuffer, you're checking a global variable, 
> BufferRecoveryConflictPending on each call, and if it's set, you check 
> the buffer's LSN against the LSN of the earliest LSN conflicting LSN, 
> and throw an error if it's greater than that. Why do we jump through so 
> many hoops to get the earliest conflicting LSN to where it's needed? At 
> the moment:
> 
> 1. Startup process sets the LSN in the target backend's PGPROC entry, 
> and signals it with SIGINT.
> 2. The target backend receives the signal; ProcessInterrupts is called 
> either immediately or at the next CHECK_FOR_INTERRUPTS() call.
> 3. ProcessInterrupts reads the value from PGPROC, and passes it to bufmgr.c
> 
> ISTM that if ReadBuffer read the value directly from the PGPROC entry, 
> there would be no need for the signaling (in the ERROR mode).

That is possible and I considered it. If we did it that way we would
need to read the PGPROC each time we read a buffer. AFAICS we would need
to use a spinlock to do that since reading an XLogRecPtr would not be
atomic.

So doing it the way I've done it allows us to use a local variable which
can be more easily cached and avoids the locking overhead.

We do still need to signal anyway for the FATAL case, so we're not
significantly affecting the patch footprint by changing that.

> Correct me if I'm wrong, but I thought the idea of this new conflict 
> resolution was that the startup process doesn't need to wait for the 
> target backend to die. Instead, the target backend knows to commit 
> suicide if it stumbles into a buffer that's been modified in a 
> conflicting way. Looking at ResolveRecoveryConflictWithVirtualXIDs, it 
> looks like we still wait.

err, no, that's just an oversight, not intentional.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas


@@ -1601,6 +1602,24 @@ BufferProcessRecoveryConflictsIfAny(volatile BufferDesc 
*bufHdr)
{
XLogRecPtr  bufLSN = BufferGetLSN(bufHdr);
 
+   /*

+* If the buffer is recent we may need to cancel ourselves
+* rather than risk returning a wrong answer. This test is
+* too conservative, but it is correct.
+*

+* We only need to cancel the current subtransaction.

+* Once we've handled the error then other subtransactions can
+* continue processing. Note that we do *not* reset the
+* BufferRecoveryConflictLSN at subcommit/abort, but we do
+* reset it if we release our last remaining sbapshot.
+* see SnapshotResetXmin()
+*


Is it really enough to cancel just the current subtransaction? What if 
it's a serializable transaction?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-23 at 16:14 +0200, Heikki Linnakangas wrote:

I made a couple of minor changes after importing your 
patches. 


I've applied them both to v9g, attached here.

If you wouldn't mind redoing the initial step, I will promise not to do
anything else to the code, except via patch on GIT.


No problem. I don't need to do it from scratch, I'll just apply the 
changes from that patch as an incremental commit. Done, you can see it 
in my git repo now too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Simon Riggs


On Fri, 2009-01-23 at 16:14 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * Put corrected version into GIT
> > * Produce outstanding items as patch-on-patch via GIT
> 
> I've applied the hot standby patch and recovery infra v9 patch to 
> branches in my git repository, and pushed those to:
> 
> git://git.postgresql.org/git/~hlinnaka/pgsql.git
> 
> You can clone that to get started.
> 
> I've set those branches up so that the hot standby branch is branched 
> off from the recovery infra branch. I'd like to keep the two separate, 
> as the recovery infra patch is useful on it's own, and can be reviewed 
> separately.
> 
> As a teaser, I made a couple of minor changes after importing your 
> patches. For the sake of the archives, I've also included those changes 
> as patches here.

OK, I'll look at those. I've fixed 6 bugs today (!), so I'd rather go
for the new version coming out in an hour. That's too many to unpick
unfortunately.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby (v9d)

2009-01-23 Thread Heikki Linnakangas


Simon Riggs wrote:

* Put corrected version into GIT
* Produce outstanding items as patch-on-patch via GIT


I've applied the hot standby patch and recovery infra v9 patch to 
branches in my git repository, and pushed those to:


git://git.postgresql.org/git/~hlinnaka/pgsql.git

You can clone that to get started.

I've set those branches up so that the hot standby branch is branched 
off from the recovery infra branch. I'd like to keep the two separate, 
as the recovery infra patch is useful on it's own, and can be reviewed 
separately.


As a teaser, I made a couple of minor changes after importing your 
patches. For the sake of the archives, I've also included those changes 
as patches here.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
>From 6d583356063c9d3c8d0e69233a40065bc5d7bde1 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 23 Jan 2009 14:37:05 +0200
Subject: [PATCH] Remove padding in XLogCtl; might be a good idea, but let's keep it simple
 for now.

Also remove the XLogCtl->mode_lck spinlock, which is pretty pointless for
a single boolean that's only written by one process.
---
 src/backend/access/transam/xlog.c |   24 +---
 1 files changed, 1 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index fcf5657..42c4552 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -340,18 +340,11 @@ typedef struct XLogCtlWrite
 
 /*
  * Total shared-memory state for XLOG.
- *
- * This small structure is accessed by many backends, so we take care to
- * pad out the parts of the structure so they can be accessed by separate
- * CPUs without causing false sharing cache flushes. Padding is generous
- * to allow for a wide variety of CPU architectures.
  */
-#define	XLOGCTL_BUFFER_SPACING	128
 typedef struct XLogCtlData
 {
 	/* Protected by WALInsertLock: */
 	XLogCtlInsert Insert;
-	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
 
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
@@ -359,16 +352,9 @@ typedef struct XLogCtlData
 	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
 	TransactionId ckptXid;
 	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
-	/* add data structure padding for above info_lck declarations */
-	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
-			- sizeof(XLogwrtResult)
-			- sizeof(uint32)
-			- sizeof(TransactionId)
-			- sizeof(XLogRecPtr)];
 
 	/* Protected by WALWriteLock: */
 	XLogCtlWrite Write;
-	char	WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
 
 	/*
 	 * These values do not change after startup, although the pointed-to pages
@@ -390,11 +376,8 @@ typedef struct XLogCtlData
 	 * always during Recovery Processing Mode. This allows us to identify
 	 * code executed *during* Recovery Processing Mode but not necessarily
 	 * by Startup process itself.
-	 *
-	 * Protected by mode_lck
 	 */
 	bool		SharedRecoveryProcessingMode;
-	slock_t		mode_lck;
 
 	/*
 	 * recovery target control information
@@ -410,8 +393,6 @@ typedef struct XLogCtlData
 	TransactionId 	recoveryLastXid;
 	XLogRecPtr		recoveryLastRecPtr;
 
-	char		InfoLockPadding[XLOGCTL_BUFFER_SPACING];
-
 	slock_t		info_lck;		/* locks shared variables shown above */
 } XLogCtlData;
 
@@ -4399,7 +4380,6 @@ XLOGShmemInit(void)
 	XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl->Insert.currpage = (XLogPageHeader) (XLogCtl->pages);
 	SpinLockInit(&XLogCtl->info_lck);
-	SpinLockInit(&XLogCtl->mode_lck);
 
 	/*
 	 * If we are not in bootstrap mode, pg_control should already exist. Read
@@ -6183,9 +6163,7 @@ IsRecoveryProcessingMode(void)
 		if (xlogctl == NULL)
 			return false;
 
-		SpinLockAcquire(&xlogctl->mode_lck);
-		LocalRecoveryProcessingMode = XLogCtl->SharedRecoveryProcessingMode;
-		SpinLockRelease(&xlogctl->mode_lck);
+		LocalRecoveryProcessingMode = xlogctl->SharedRecoveryProcessingMode;
 	}
 
 	knownProcessingMode = true;
-- 
1.5.6.5

>From 4061ac8f84cc699bf0f417689f853791089ed472 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 23 Jan 2009 15:55:33 +0200
Subject: [PATCH] Remove knownProcessingMode variable.

---
 src/backend/access/transam/xlog.c |   19 ---
 1 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 7e480e2..e64fb48 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -126,9 +126,11 @@ bool		InRecovery = false;
 /* Are we recovering using offline XLOG archives? */
 static bool InArchiveRecovery = false;
 
-/* Local copy of shared RecoveryProcessingMode state */
+/*
+ * Local copy of shared RecoveryProcessingMode state. True actually
+ * means "not known, need to check the shared state"
+ */
 static bool LocalRecoveryProcessingMode = true;
-static bool knownProcessingMode = false;
 
 /*

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-20 Thread Heikki Linnakangas


Simon Riggs wrote:

I did want you to commit those changes, but that was 8 days ago and much
has changed since then. Now, I'm slightly worried that this may be a
retrograde step. The last 3 days I've been refactoring the code to
account for other requirements, as I have been discussing, and some of
these things have now changed. Though the general pattern of your
suggested refactoring remains the same.


I figured there's possibly some further changes, but the general idea to 
move the responsibility of restoring backup blocks to the redo function 
should remain the same.



Can we plan a move to GIT? We can both work on things at the same time
and my changes can be more visible. There will be some initial pain...


Sure, just go ahead and publish a repository. Or would you like me to 
apply the patches and publish a repository you can clone from? Perhaps 
that would be easier since I'm already familiar with GIT.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-20 Thread Simon Riggs

On Tue, 2009-01-20 at 21:01 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > It would be useful to nibble away at the patch, committing all the
> > necessary refactorings to make the patch work. That will reduce size of
> > eventual commit.
> 
> I committed this part now; one less thing to review. Please adjust the 
> patch accordingly.

OK, thanks.

I did want you to commit those changes, but that was 8 days ago and much
has changed since then. Now, I'm slightly worried that this may be a
retrograde step. The last 3 days I've been refactoring the code to
account for other requirements, as I have been discussing, and some of
these things have now changed. Though the general pattern of your
suggested refactoring remains the same.

I'll check in more detail, but please could we talk before you commit
parts, otherwise we'll just trip over each other.

I'll post the new version (v8f) (which is pre-commit) this evening, so
we can discuss.

Can we plan a move to GIT? We can both work on things at the same time
and my changes can be more visible. There will be some initial pain...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-20 Thread Heikki Linnakangas


Simon Riggs wrote:

It would be useful to nibble away at the patch, committing all the
necessary refactorings to make the patch work. That will reduce size of
eventual commit.


I committed this part now; one less thing to review. Please adjust the 
patch accordingly.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs

On Mon, 2009-01-19 at 16:50 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I prefer the last one, but if you think otherwise, please shout.
> 
> We're now emitting WAL records for relcache invalidations, IIRC. I 
> wonder if those are useful for this?

Tom already objected to putting strange inval messages into WAL.

Hmm, DROP USER is transactional, so we can only do this at commit. So
forget the other ideas I had.

We already know about the auth file update at commit.

So we should say, at commit, re-read the list of roleids in use and if
any don't match a row in pg_user then remove them. If we do that after
the flat file update and the actual commit that removes the user then we
will be guaranteed no race condition exists to allow new users to logon
as we try to disconnect them.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Heikki Linnakangas


Simon Riggs wrote:

I prefer the last one, but if you think otherwise, please shout.


We're now emitting WAL records for relcache invalidations, IIRC. I 
wonder if those are useful for this?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs

On Mon, 2009-01-19 at 12:54 +, Simon Riggs wrote:
> Some refactoring in this area is also required
> because we need to handle two other types of conflict to correctly
> support drop database and drop user, which is now underway.

I've hung the drop database conflict code in dbase_redo().

For drop role, there isn't an rmgr at all, but I can add code in a few
places.

* add XLOG_DBASE_DROP_USER - i.e. add drop user to the Database rmgr

* DropRole() takes an AccessExclusiveLock, so we do write a WAL record
for it. I could add a special case to the Relation rmgr.

* Add a new rmgr (unused slot 7) and have it handle DropRole.

I prefer the last one, but if you think otherwise, please shout.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs


On Mon, 2009-01-19 at 15:47 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I suggest we handle this on the recovery side, not on the master, by
> > deriving the xmin at the point the WAL record arrives. We would
> > calculate it by looking at recovery procs only. That will likely give us
> > a later value than we would get from the master, but that can't be
> > helped.
> 
> Hmm, that's an interesting idea. It presumes that we see an abort/commit 
> WAL record at the right moment for every transaction that we have a 
> recovery proc for. We just concluded in the other thread that we do 
> always emit abortion records when the database is running normally; I 
> think that's good enough for this purpose.

But not perfect.

> A few other random ideas I had:
> 
> - in btree delete redo, follow the index pointers, and look at the xids 
> on the heap tuples. That requires some random I/O, but will give the 
> exact value we need. Since it's quite expensive, I think we'd only want 
> to do it after using some more conservative test but quicker test to 
> determine that there might be a conflict.

Ouch.

> - Add latestRemovedXid to b-tree page header, and update it as tuples 
> are killed. Need to tolerate the fact that tuple kills are not WAL-logged.

Sounds easy-ish. 

If tuple kills aren't WAL logged then if we crash latestRemovedXid will
remain as it was at time of last write. So if we do a delete scan it
will only remove the index tuples with hint bits set at time of that
write, so the value would always be correct, no?

I'm somehow uncomfortable with this idea though. Care to persuade me
further?

> > Btree deletes were an important optimisation when it first went it, but
> > now we have HOT it is much less important. 
> 
> If HOT is working well for your application, there won't be many btree 
> deletes anyway, and the whole issue is moot.

That was my point.

> > Another route might be to put
> > an option to turn off btree delete on the master, default = on. We
> > probably should consider turning it off entirely when it doesn't yield
> > significant benefit.
> 
> I'd rather put in a generic mechanism to prevent vacuuming of recent 
> tuples that might still be needed in the standby. Like always 
> subtracting a fixed amount of xids from OldestXmin/RecentGlobalXmin, or 
> having a feedback loop from the standby to the master, allowing the 
> master to say what it's oldest xmin is. But that's a fair amount of 
> work; I'd rather leave that as a future enhancement, and just figure out 
> something simple for this specific issue. We'll need to handle it 
> gracefully even if we try to avoid it by retaining dead tuples longer.

Yeh, looked at both of those also. Definitely after sync rep goes in
though.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Heikki Linnakangas


Simon Riggs wrote:

I suggest we handle this on the recovery side, not on the master, by
deriving the xmin at the point the WAL record arrives. We would
calculate it by looking at recovery procs only. That will likely give us
a later value than we would get from the master, but that can't be
helped.


Hmm, that's an interesting idea. It presumes that we see an abort/commit 
WAL record at the right moment for every transaction that we have a 
recovery proc for. We just concluded in the other thread that we do 
always emit abortion records when the database is running normally; I 
think that's good enough for this purpose.


A few other random ideas I had:

- in btree delete redo, follow the index pointers, and look at the xids 
on the heap tuples. That requires some random I/O, but will give the 
exact value we need. Since it's quite expensive, I think we'd only want 
to do it after using some more conservative test but quicker test to 
determine that there might be a conflict.


- Add latestRemovedXid to b-tree page header, and update it as tuples 
are killed. Need to tolerate the fact that tuple kills are not WAL-logged.



Btree deletes were an important optimisation when it first went it, but
now we have HOT it is much less important. 


If HOT is working well for your application, there won't be many btree 
deletes anyway, and the whole issue is moot.



Another route might be to put
an option to turn off btree delete on the master, default = on. We
probably should consider turning it off entirely when it doesn't yield
significant benefit.


I'd rather put in a generic mechanism to prevent vacuuming of recent 
tuples that might still be needed in the standby. Like always 
subtracting a fixed amount of xids from OldestXmin/RecentGlobalXmin, or 
having a feedback loop from the standby to the master, allowing the 
master to say what it's oldest xmin is. But that's a fair amount of 
work; I'd rather leave that as a future enhancement, and just figure out 
something simple for this specific issue. We'll need to handle it 
gracefully even if we try to avoid it by retaining dead tuples longer.



Lots of scanning to remove the odd row is probably
pretty wasteful and likely adds contention at the very point we don't
want it - index splits.


Remember that if you can remove enough dead tuples from the index page, 
you've just made room on the page and don't need to split. Splitting is 
pretty expensive compared to scanning a few line pointers.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs

On Mon, 2009-01-19 at 14:00 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > GetSnapshotData() sets
> > RecentGlobalXmin to the result of the snapshot's xmin.
> 
> No. RecentGlobalXmin is set to the oldest *xmin* observed, across all 
> running transactions. TransactionXmin is the xid of the oldest running 
> transaction. IOW, RecentGlobalXmin is the xid of transaction that the 
> oldest running transaction sees as running.

OK. That was fun.

These WAL records are annoying, no matter what the exact value of
latestRemovedXid is and they seem likely to conflict with many queries
on the standby. 

If we don't use RecentGlobalXmin then I can't see any easily derived
value that we can use in its place. It isn't worth the effort on the
master to derive a more exact value, not when we don't even know if it
matters yet. 

I suggest we handle this on the recovery side, not on the master, by
deriving the xmin at the point the WAL record arrives. We would
calculate it by looking at recovery procs only. That will likely give us
a later value than we would get from the master, but that can't be
helped.

For me, this makes it essential now that I put in place the deferred
cancellation mechanism. Some refactoring in this area is also required
because we need to handle two other types of conflict to correctly
support drop database and drop user, which is now underway.

Btree deletes were an important optimisation when it first went it, but
now we have HOT it is much less important. Another route might be to put
an option to turn off btree delete on the master, default = on. We
probably should consider turning it off entirely when it doesn't yield
significant benefit. Lots of scanning to remove the odd row is probably
pretty wasteful and likely adds contention at the very point we don't
want it - index splits.

Thoughts? 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Heikki Linnakangas


Simon Riggs wrote:

GetSnapshotData() sets
RecentGlobalXmin to the result of the snapshot's xmin.


No. RecentGlobalXmin is set to the oldest *xmin* observed, across all 
running transactions. TransactionXmin is the xid of the oldest running 
transaction. IOW, RecentGlobalXmin is the xid of transaction that the 
oldest running transaction sees as running.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs


On Mon, 2009-01-19 at 12:50 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > One of us needs a coffee.
> 
> Clearly, I put the kettle on...

I had one too, just in case.

> > How does Transaction 4 have a RecentGlobalXmin of 2 in step (7), but at
> > step (9) the value of RecentGlobalXmin has gone backwards?
> 
> Looks like I mixed up the xids of the two transactions in steps 8 and 9. 
> Let's see if I got it right this time:
> 
> 1. Transaction 1 begins in backend A
> 2. Transaction 2 begins in backend B, xmin = 1
> 3. Transaction 1 ends
> 4. Transaction 3 begins in backend C, xmin = 2
> 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
> 6. Transaction 2 ends.
> 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
> RecentGlobalXmin = 2
> 8. Transaction 4 kills tuple, using its RecentGlobalxmin of 2
> 9. Transaction 3 splits the page, emits a delete xlog record, setting 
> latestRemovedXid to its RecentGlobalXmin of 1

I don't see how step (5) is possible. GetSnapshotData() sets
RecentGlobalXmin to the result of the snapshot's xmin.

If step (5) is possible, then yes, step (9) can happen.

You are correct to say that RecentGlobalXmin is not always correctly
set. All I'm saying is that at the exact time, place and circumstance I
use it, it is correct. In other cases, it may not be.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Heikki Linnakangas


Simon Riggs wrote:

One of us needs a coffee.


Clearly, I put the kettle on...


How does Transaction 4 have a RecentGlobalXmin of 2 in step (7), but at
step (9) the value of RecentGlobalXmin has gone backwards?


Looks like I mixed up the xids of the two transactions in steps 8 and 9. 
Let's see if I got it right this time:


1. Transaction 1 begins in backend A
2. Transaction 2 begins in backend B, xmin = 1
3. Transaction 1 ends
4. Transaction 3 begins in backend C, xmin = 2
5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
6. Transaction 2 ends.
7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
RecentGlobalXmin = 2

8. Transaction 4 kills tuple, using its RecentGlobalxmin of 2
9. Transaction 3 splits the page, emits a delete xlog record, setting 
latestRemovedXid to its RecentGlobalXmin of 1


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs


On Mon, 2009-01-19 at 12:22 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > Well, steps 7 and 8 don't make sense.
> > 
> > Your earlier comment was that it was possible for a WAL record to be
> > written with a RecentGlobalXmin that was lower than other backends
> > values. In step 9 the RecentGlobalXmin is *not* lower than any other
> > backend, it is the same. 
> > 
> > So if there is a proof, this isn't it. 
> 
> Yeah, you're right. I got steps 8 and 9 mixed. Let me try again:
> 
> 1. Transaction 1 begins in backend A
> 2. Transaction 2 begins in backend B, xmin = 1
> 3. Transaction 1 ends
> 4. Transaction 3 begins in backend C, xmin = 2
> 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
> 6. Transaction 2 ends.
> 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
> RecentGlobalXmin = 2
> 8. Transaction 3 kills tuple, using its RecentGlobalxmin of 2
> 9. Transaction 4 splits the page, emits a delete xlog record, setting 
> latestRemovedXid to its RecentGlobalXmin of 1

One of us needs a coffee.

How does Transaction 4 have a RecentGlobalXmin of 2 in step (7), but at
step (9) the value of RecentGlobalXmin has gone backwards?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Heikki Linnakangas


Simon Riggs wrote:

Well, steps 7 and 8 don't make sense.

Your earlier comment was that it was possible for a WAL record to be
written with a RecentGlobalXmin that was lower than other backends
values. In step 9 the RecentGlobalXmin is *not* lower than any other
backend, it is the same. 

So if there is a proof, this isn't it. 


Yeah, you're right. I got steps 8 and 9 mixed. Let me try again:

1. Transaction 1 begins in backend A
2. Transaction 2 begins in backend B, xmin = 1
3. Transaction 1 ends
4. Transaction 3 begins in backend C, xmin = 2
5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
6. Transaction 2 ends.
7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
RecentGlobalXmin = 2

8. Transaction 3 kills tuple, using its RecentGlobalxmin of 2
9. Transaction 4 splits the page, emits a delete xlog record, setting 
latestRemovedXid to its RecentGlobalXmin of 1



But I can't see how there can be one: Two concurrent vacuums can have
different OldestXmin values, but two concurrent transactions cannot.


Of course they can.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-19 Thread Simon Riggs

On Mon, 2009-01-19 at 09:16 +0200, Heikki Linnakangas wrote: 
> Simon Riggs wrote:
> > On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote:
> > 
>  RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
>  that GetOldestXmin() would return. If another backend has a more recent 
>  RecentGlobalXmin value, and has killed more recent tuples on the page, 
>  the latestRemovedXid written here is too old.
> >>> What do you think we should do instead?
> >> Dunno. Maybe call GetOldestXmin().
> > 
> > We are discussing btree deletes, not btree vacuums. 
> 
> Pardon my ignorance, but what's the difference?

In terms of current HEAD, not much. In terms of Hot Standby, a
significant difference - the two actions have been split, rather than
continuing to share the same WAL record.

XLOG_BTREE_VACUUM removes index tuples as a result of a vacuum. The
initial scan of the heap already generated an XLOG_HEAP2_CLEANUP_INFO
which gives the latestRemovedXid for that vacuum. So we don't need to
worry about putting a latestRemovedXid on XLOG_BTREE_VACUUM. The WAL
records also differ because the XLOG_BTREE_VACUUM contains details of
blocks that need to be pinned but not otherwise touched.

XLOG_BTREE_DELETE is different in 3 ways. It isn't part of a vacuum, so:
* we don't need to take a cleanup lock
* it doesn't contain info about other blocks we need to scan beforehand
for correctness purposes
* it wasn't preceded by an XLOG_HEAP2_CLEANUP_INFO record, so it must
have a *correct* (even if too conservative) value for latestRemovedXid
set.

So the only time we need to set latestRemovedXid correctly is during a
normal transaction, not during a vacuum.

> > If we are doing
> > btree delete then we have an unreleased snapshot therefore we also have
> > a non-zero xmin. How can another backend have a later RecentGlobalXmin
> > or result from GetOldestXmin() than we do?
> 
> Sure it can, for example:
> 
> 1. Transaction 1 begins in backend A
> 2. Transaction 2 begins in backend B, xmin = 1
> 3. Transaction 1 ends
> 4. Transaction 3 begins in backend C, xmin = 2
> 5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
> 6. Transaction 2 ends.
> 7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
> RecentGlobalXmin = 2
> 8. Transaction 4 kills tuple, using its RecentGlobalxmin of 1
> 9. Transaciont 3 splits the page, emits a delete xlog record, setting 
> latestRemovedXid to its RecentGlobalXmin of 2

Well, steps 7 and 8 don't make sense.

Your earlier comment was that it was possible for a WAL record to be
written with a RecentGlobalXmin that was lower than other backends
values. In step 9 the RecentGlobalXmin is *not* lower than any other
backend, it is the same. 

So if there is a proof, this isn't it. 

But I can't see how there can be one: Two concurrent vacuums can have
different OldestXmin values, but two concurrent transactions cannot.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-18 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote:

RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
that GetOldestXmin() would return. If another backend has a more recent 
RecentGlobalXmin value, and has killed more recent tuples on the page, 
the latestRemovedXid written here is too old.

What do you think we should do instead?

Dunno. Maybe call GetOldestXmin().


We are discussing btree deletes, not btree vacuums. 


Pardon my ignorance, but what's the difference?


If we are doing
btree delete then we have an unreleased snapshot therefore we also have
a non-zero xmin. How can another backend have a later RecentGlobalXmin
or result from GetOldestXmin() than we do?


Sure it can, for example:

1. Transaction 1 begins in backend A
2. Transaction 2 begins in backend B, xmin = 1
3. Transaction 1 ends
4. Transaction 3 begins in backend C, xmin = 2
5. Backend C gets snapshot, TransactionXmin = 2, RecentGlobalXmin = 1
6. Transaction 2 ends.
7. Transaction 4 begins in backend A, gets snapshot TransactionXmin = 2, 
RecentGlobalXmin = 2

8. Transaction 4 kills tuple, using its RecentGlobalxmin of 1
9. Transaciont 3 splits the page, emits a delete xlog record, setting 
latestRemovedXid to its RecentGlobalXmin of 2


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-16 Thread Simon Riggs


On Fri, 2009-01-16 at 22:09 +0200, Heikki Linnakangas wrote:

> >> RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
> >> that GetOldestXmin() would return. If another backend has a more recent 
> >> RecentGlobalXmin value, and has killed more recent tuples on the page, 
> >> the latestRemovedXid written here is too old.
> > 
> > What do you think we should do instead?
> 
> Dunno. Maybe call GetOldestXmin().

We are discussing btree deletes, not btree vacuums. If we are doing
btree delete then we have an unreleased snapshot therefore we also have
a non-zero xmin. How can another backend have a later RecentGlobalXmin
or result from GetOldestXmin() than we do?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-16 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-16 at 16:33 +0200, Heikki Linnakangas wrote:

I don't think RecentGlobalXmin is good enough here:


!   /*
!* We would like to set an accurate latestRemovedXid, 
but there
!* is no easy way of obtaining a useful value. So we 
use the
!* probably far too conservative value of 
RecentGlobalXmin instead.
!*/
!   xlrec_delete.latestRemovedXid = RecentGlobalXmin;
!   rdata[0].data = (char *) &xlrec_delete;
!   rdata[0].len = SizeOfBtreeDelete;
RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
that GetOldestXmin() would return. If another backend has a more recent 
RecentGlobalXmin value, and has killed more recent tuples on the page, 
the latestRemovedXid written here is too old.


What do you think we should do instead?


Dunno. Maybe call GetOldestXmin().

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-16 Thread Simon Riggs


On Fri, 2009-01-16 at 16:33 +0200, Heikki Linnakangas wrote:
> I don't think RecentGlobalXmin is good enough here:
> 
> > !   /*
> > !* We would like to set an accurate latestRemovedXid, 
> > but there
> > !* is no easy way of obtaining a useful value. So we 
> > use the
> > !* probably far too conservative value of 
> > RecentGlobalXmin instead.
> > !*/
> > !   xlrec_delete.latestRemovedXid = RecentGlobalXmin;
> > !   rdata[0].data = (char *) &xlrec_delete;
> > !   rdata[0].len = SizeOfBtreeDelete;
> 
> RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
> that GetOldestXmin() would return. If another backend has a more recent 
> RecentGlobalXmin value, and has killed more recent tuples on the page, 
> the latestRemovedXid written here is too old.

What do you think we should do instead?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot Standby dev build (v8)

2009-01-16 Thread Heikki Linnakangas


I don't think RecentGlobalXmin is good enough here:


!   /*
!* We would like to set an accurate latestRemovedXid, 
but there
!* is no easy way of obtaining a useful value. So we 
use the
!* probably far too conservative value of 
RecentGlobalXmin instead.
!*/
!   xlrec_delete.latestRemovedXid = RecentGlobalXmin;
!   rdata[0].data = (char *) &xlrec_delete;
!   rdata[0].len = SizeOfBtreeDelete;


RecentGlobalXmin is just a hint, it lags behind the real oldest xmin 
that GetOldestXmin() would return. If another backend has a more recent 
RecentGlobalXmin value, and has killed more recent tuples on the page, 
the latestRemovedXid written here is too old.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-15 Thread Simon Riggs


On Thu, 2009-01-15 at 22:10 +, Simon Riggs wrote:

> I notice that we are no longer able to record the databaseId for a
> connection, so query conflict resolution cannot be isolated to
> individual databases any longer.

Wrong way round. Incoming WAL records from dbOid on them, so we can
still check for conflicts against the db of the standby backends.
Phew!

> I'll leave all of the databaseId stuff in there in case we think of
> anything good.

Ripping out now as final part of earlier refactoring.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-15 Thread Simon Riggs

On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > If you want to do things a different way you need to say what you want
> > to do and what effects those changes will have. 
> 
> I want to reduce the coupling between the primary and the master. The 
> less they need to communicate, the better. I want to get rid of slotid, 
> and as many of the other extra information carried in WAL records that I 
> can. I believe that will make the patch both simpler and more robust.
> 
> > Are there tradeoffs? If so what are they?
> 
> I don't think there's any big difference in user-visible behavior. 

I notice that we are no longer able to record the databaseId for a
connection, so query conflict resolution cannot be isolated to
individual databases any longer.

We might sometimes infer a transaction's databaseId from certain WAL
records but that is only going to be possible within each rmgr, which is
fairly strange.

I'll leave all of the databaseId stuff in there in case we think of
anything good.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-15 Thread Simon Riggs

On Thu, 2009-01-15 at 18:16 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:

> Is the first really useful? I would understand "advance until next 
> cleanup record *that would block/kill queries*", but why would you
> want to advance until the next cleanup record? 

Minor difference there, but noted.

> Anyway, if it is useful, you could do the pausing in
> XactResolveRecoveryConflicts, too.

Well that spreads code around that was previously fairly tight, which
I'm not that happy with but I'll do as you suggest. We need to crack on
now.

I want to keep the feature at least until we have some serious feedback
in the beta phase that it isn't necessary. Usability is close behind
correctness and robustness.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-15 Thread Simon Riggs


On Thu, 2009-01-15 at 18:05 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > The idea outlined before didn't deal with all call points for
> > RecordIsCleanupRecord(), so doesn't actually work.
> 
> Are we talking about the same thing? 

I guess not. Look at the other call points for that function, cos your
suggestion didn't include them AFAICS.

> If we put the control of locking to 
> the hands of the redo-function, I don't see why it couldn't use a lock 
> of the right strength. Surely the redo-function can be taught what lock 
> it needs to take.

Yes, it can, but there were other uses.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-15 Thread Heikki Linnakangas


Simon Riggs wrote:

The idea outlined before didn't deal with all call points for
RecordIsCleanupRecord(), so doesn't actually work.


Hmm, grep finds two call points for that:


!   case RECOVERY_TARGET_PAUSE_CLEANUP:
!   /*
!* Advance until we see a cleanup 
record, then pause.
!*/
!   if (RecordIsCleanupRecord(record))
!   pauseHere = true;
!   break;
! 



and


+   /*
+* Wait, kill or otherwise resolve any 
conflicts between
+* incoming cleanup records and user 
queries. This is the
+ 	 * main barrier that allows MVCC to work correctly when 
+ 	 * running standby servers. Only need to do this if there

+* is a possibility that users may be 
active.
+*/
+   if (reachedSafeStartPoint && 
RecordIsCleanupRecord(record))
+   
ResolveRedoVisibilityConflicts(EndRecPtr, record);


The second we can easily handle by getting rid of 
ResolveRedoVisibilityConflicts functions and making the redo-functions 
call XactResolveRecoveryConflicts when necessary.


Is the first really useful? I would understand "advance until next 
cleanup record *that would block/kill queries*", but why would you want 
to advance until the next cleanup record? Anyway, if it is useful, you 
could do the pausing in XactResolveRecoveryConflicts, too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-15 Thread Heikki Linnakangas


Simon Riggs wrote:

The idea outlined before didn't deal with all call points for
RecordIsCleanupRecord(), so doesn't actually work.


Are we talking about the same thing? If we put the control of locking to 
the hands of the redo-function, I don't see why it couldn't use a lock 
of the right strength. Surely the redo-function can be taught what lock 
it needs to take.



ISTM easier to do things within the rmgr at the time WAL records are
written, rather than in the rmgr while handling redo.


I don't like that idea. I'd like to keep the coupling between the 
primary and standby to the minimum.



This avoids another rmgr call and is much more straightforward since we
define how to redo the record at the time it is written, rather than via
a separate mechanism that could mismatch. 


The code that generates a WAL record and the redo-functions need to 
match in general anyway. The strength of the lock is not any more 
error-prone than other things that a redo-function must do.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-15 Thread Simon Riggs

On Fri, 2009-01-09 at 19:34 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:

> > It would be useful to nibble away at the patch, committing all the
> > necessary refactorings to make the patch work. That will reduce size of
> > eventual commit.
> 
> Agreed. This in particular is a change I feel pretty confident to commit 
> beforehand.

I'm looking to implement refactoring of this now.

The idea outlined before didn't deal with all call points for
RecordIsCleanupRecord(), so doesn't actually work.

ISTM easier to do things within the rmgr at the time WAL records are
written, rather than in the rmgr while handling redo.

We currently have 2 bytes spare on the WAL record, so I propose to add
an uint16 xl_info2 field (again). This can then be marked with 2 bits:
* 1 bit to show that it is a cleanup record and may conflict
* 1 bit to show that backup blocks must be applied with cleanup lock
Just to say again that adding these is free: we use no more space
because of alignment.

This avoids another rmgr call and is much more straightforward since we
define how to redo the record at the time it is written, rather than via
a separate mechanism that could mismatch. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-13 Thread Simon Riggs


On Sat, 2009-01-10 at 09:40 +, Simon Riggs wrote:

> But looking at that, I
> think the 6a patch had a bug there: a subtransaction abort record
> would release locks for the whole top level xact. 

Fixed.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2009-01-13 Thread Simon Riggs


On Tue, 2008-12-30 at 18:31 +0200, Heikki Linnakangas wrote:

> You have to be careful to ignore the flags in read-only transactions 
> that started in hot standby mode, even if recovery has since ended and 
> we're in normal operation now.

My initial implementation in v6 worked, but had a corner case if a
transaction spanned the change from recovery into normal processing.

I'm now done on a more complete fix, will be in v8.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-12 Thread Simon Riggs

Please commit soon

On Fri, 2009-01-09 at 18:30 +0200, Heikki Linnakangas wrote:
> The hot standby patch has some hacks to decide which full-page-images 
> can be restored holding an exclusive lock and which ones need a 
> vacuum-strength lock. It's not very pretty as is, as mentioned in 
> comments too.
> 
> How about we refactor things so that redo-functions are responsible for 
> calling RestoreBkpBlocks? The redo function can then pass an argument 
> indicating what kind of lock is required. We should also change 
> XLogReadBufferExtended so that it doesn't lock the page; the caller 
> knows better what kind of a lock it needs. That makes it more analogous 
> with ReadBufferExtended too, although I think we should keep 
> XLogReadBuffer() unchanged for now.
> 
> See attached patch. One shortfall of this patch is that you can pass 
> only one argument to RestoreBkpBlocks, but there can multiple backup 
> blocks in one WAL record. That's OK in current usage, though.
> 
-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-12 Thread Heikki Linnakangas


Simon Riggs wrote:

Rather than store the
parent xid itself we store the difference between the current xid and
the parent xid. Typically this will be less than 65535; when it is not
we set it to zero but issue an xid assignment xlog record.


That sounds pretty hacky.


However, I think XactLockTableWait() doesn't need to know the parent
either. (This feels more like wishful thinking, but here goes anyway).
We release locks *after* TransactionIdAbortTree() has fully executed, so
the test for TransactionIdIsInProgress(xid) will always see the abort
status, if set. Notice that if we are awake at all it is because the
top-level transaction is complete or our subxid is aborted. So there is
never any need to look at the status of the parent, nor in fact any need
to look at the procarray at all, which is always a waste of effort. 


Right, we don't currently write a WAL record at subtransaction commit, 
only at subtransaction abort or top-level commit. So the problem 
described in the comment at XactLockTableWait() can't arise in the standby.


Actually, I wonder if we should write a WAL record at subtransaction 
commit too, to save on shared memory in the standby as well.



If
you believe that, you'll want to commit the attached patch (or something
similar with comments refactored etc).


Umm, we still need the SubTransGetParent() call in normal operation.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-12 Thread Simon Riggs

On Mon, 2009-01-12 at 14:10 +0200, Heikki Linnakangas wrote:

> > However, I think XactLockTableWait() doesn't need to know the parent
> > either. (This feels more like wishful thinking, but here goes anyway).
> > We release locks *after* TransactionIdAbortTree() has fully executed, so
> > the test for TransactionIdIsInProgress(xid) will always see the abort
> > status, if set. Notice that if we are awake at all it is because the
> > top-level transaction is complete or our subxid is aborted. So there is
> > never any need to look at the status of the parent, nor in fact any need
> > to look at the procarray at all, which is always a waste of effort. 
> 
> Right, we don't currently write a WAL record at subtransaction commit, 
> only at subtransaction abort or top-level commit. So the problem 
> described in the comment at XactLockTableWait() can't arise in the standby.

Good. So we can exclude parent_xid and just include top level xid. So we
are now officially back inside the current xlog record padding: we don't
need to increase WAL record length and therefore we can lose slotid
(yay!).

> Actually, I wonder if we should write a WAL record at subtransaction 
> commit too, to save on shared memory in the standby as well.

You understand that the current design never performs
XactLockTableInsert() for individual top/sub transactions? There would
be no memory overhead to save.

The only locks allowed are AccessShareLocks which never conflict with
anything except for AccessExclusiveLocks. If an AEL conflicts with an
ASL we blow away the ASL holders. If an ASL requestor is blocked by a an
AEL the wait is performed on the single lock table entry for the Startup
process, which acts as a proxy for all emulated AEL lock holders.

> > If
> > you believe that, you'll want to commit the attached patch (or something
> > similar with comments refactored etc).
> 
> Umm, we still need the SubTransGetParent() call in normal operation.

OK, that was really just a thought experiment anyway to prove the point
one way or the other.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-12 Thread Simon Riggs

Heikki, can I get your feedback on this urgently please? I want to
respond positively to your review comments and complete something you
will find acceptable. But I need to know what that is, please.


On Sun, 2009-01-11 at 11:55 +, Simon Riggs wrote:
> On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > There's a confusion in the patch between top level xid and parent xid.
> > > The xl field is named parentxid but actually contains top level. That
> > > distinction is important because the code now uses the top level xid to
> > > locate the recovery proc, formerly the role of the slotid.
> > 
> > True, I changed the meaning halfway through. My thinking was that we can 
> > get away by only tracking the top-level xact of each subtransaction, not 
> > the whole tree of subtransactions.
> > 
> > > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > > since this assumes that the top_xid is the parent, which it is not.
> > > Mostly you wouldn't notice unless you were looking up the subtrans
> > > status for an xid that had committed but was the child of an aborted
> > > subtransaction, with the top level xid having > 64 subtransactions.
> > 
> > Hmm. When a subtransaction aborts, RecordTransactionAbort is called and 
> > clog is updated to show the subtransaction and all it's subcommitted 
> > children as aborted. I think we're safe, though it wouldn't hurt to 
> > double-check.
> 
> That part of your argument is correct but we are not safe. But please
> look at XactLockTableWait() and see what you think. According to
> comments this would lead to different lock release behaviour. 
> 
> Beauty, thy name is not subtransaction.
> 
> If you agree that we need the parent xid then we have further problems.
> Adding another xid onto the header of each WAL record will add 8 bytes
> onto each record because of alignment. This was the other reason for
> slotid, since I was able to fit that into just 2 bytes and avoid the 8
> byte loss. Moving swiftly on, I have this morning though of an
> alternative, so at least we now have some choice. Rather than store the
> parent xid itself we store the difference between the current xid and
> the parent xid. Typically this will be less than 65535; when it is not
> we set it to zero but issue an xid assignment xlog record.
> 
> However, I think XactLockTableWait() doesn't need to know the parent
> either. (This feels more like wishful thinking, but here goes anyway).
> We release locks *after* TransactionIdAbortTree() has fully executed, so
> the test for TransactionIdIsInProgress(xid) will always see the abort
> status, if set. Notice that if we are awake at all it is because the
> top-level transaction is complete or our subxid is aborted. So there is
> never any need to look at the status of the parent, nor in fact any need
> to look at the procarray at all, which is always a waste of effort. If
> you believe that, you'll want to commit the attached patch (or something
> similar with comments refactored etc).
> 
> > For xl_rel_lock you add a field called xid and then ask
> > > /* xid of the *parent* transaction. XXX why parent? */.
> > > You've done this because it replaced slotid. But looking at that, I
> > > think the 6a patch had a bug there: a subtransaction abort record would
> > > release locks for the whole top level xact. So we need to pass both top
> > > level xid (or slotid) and xid for each lock, then release using the
> > > actual xid only.
> > 
> > Hmm, would just the subtransaction xid be enough? If we use that to 
> > release, what do we need the top-level xid for?
> 
> So we can release all locks taken by subxids at once, when we learn that
> a top level xid has disappeared. A minor point.
> 
-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-11 Thread Simon Riggs

On Sun, 2009-01-11 at 10:41 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > There's a confusion in the patch between top level xid and parent xid.
> > The xl field is named parentxid but actually contains top level. That
> > distinction is important because the code now uses the top level xid to
> > locate the recovery proc, formerly the role of the slotid.
> 
> True, I changed the meaning halfway through. My thinking was that we can 
> get away by only tracking the top-level xact of each subtransaction, not 
> the whole tree of subtransactions.
> 
> > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > since this assumes that the top_xid is the parent, which it is not.
> > Mostly you wouldn't notice unless you were looking up the subtrans
> > status for an xid that had committed but was the child of an aborted
> > subtransaction, with the top level xid having > 64 subtransactions.
> 
> Hmm. When a subtransaction aborts, RecordTransactionAbort is called and 
> clog is updated to show the subtransaction and all it's subcommitted 
> children as aborted. I think we're safe, though it wouldn't hurt to 
> double-check.

That part of your argument is correct but we are not safe. But please
look at XactLockTableWait() and see what you think. According to
comments this would lead to different lock release behaviour. 

Beauty, thy name is not subtransaction.

If you agree that we need the parent xid then we have further problems.
Adding another xid onto the header of each WAL record will add 8 bytes
onto each record because of alignment. This was the other reason for
slotid, since I was able to fit that into just 2 bytes and avoid the 8
byte loss. Moving swiftly on, I have this morning though of an
alternative, so at least we now have some choice. Rather than store the
parent xid itself we store the difference between the current xid and
the parent xid. Typically this will be less than 65535; when it is not
we set it to zero but issue an xid assignment xlog record.

However, I think XactLockTableWait() doesn't need to know the parent
either. (This feels more like wishful thinking, but here goes anyway).
We release locks *after* TransactionIdAbortTree() has fully executed, so
the test for TransactionIdIsInProgress(xid) will always see the abort
status, if set. Notice that if we are awake at all it is because the
top-level transaction is complete or our subxid is aborted. So there is
never any need to look at the status of the parent, nor in fact any need
to look at the procarray at all, which is always a waste of effort. If
you believe that, you'll want to commit the attached patch (or something
similar with comments refactored etc).

> For xl_rel_lock you add a field called xid and then ask
> > /* xid of the *parent* transaction. XXX why parent? */.
> > You've done this because it replaced slotid. But looking at that, I
> > think the 6a patch had a bug there: a subtransaction abort record would
> > release locks for the whole top level xact. So we need to pass both top
> > level xid (or slotid) and xid for each lock, then release using the
> > actual xid only.
> 
> Hmm, would just the subtransaction xid be enough? If we use that to 
> release, what do we need the top-level xid for?

So we can release all locks taken by subxids at once, when we learn that
a top level xid has disappeared. A minor point.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
Index: src/backend/access/transam/xact.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.271
diff -c -r1.271 xact.c
*** src/backend/access/transam/xact.c	1 Jan 2009 17:23:36 -	1.271
--- src/backend/access/transam/xact.c	11 Jan 2009 11:32:24 -
***
*** 416,422 
  	s->transactionId = GetNewTransactionId(isSubXact);

  	if (isSubXact)
! 		SubTransSetParent(s->transactionId, s->parent->transactionId);

  	/*
  	 * Acquire lock on the transaction XID.  (We assume this cannot block.) We
--- 416,422 
  	s->transactionId = GetNewTransactionId(isSubXact);

  	if (isSubXact)
! 		SubTransSetParent(s->transactionId, GetTopTransactionIdIfAny());

  	/*
  	 * Acquire lock on the transaction XID.  (We assume this cannot block.) We
Index: src/backend/storage/lmgr/lmgr.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/lmgr/lmgr.c,v
retrieving revision 1.99
diff -c -r1.99 lmgr.c
*** src/backend/storage/lmgr/lmgr.c	1 Jan 2009 17:23:47 -	1.99
--- src/backend/storage/lmgr/lmgr.c	11 Jan 2009 11:50:51 -
***
*** 477,485 

  		LockRelease(&tag, ShareLock, false);

! 		if (!TransactionIdIsInProgress(xid))
  			break;
- 		xid = SubTransGetParent(xid);
  	}
  }

--- 477,484 

  		LockRelease(&tag, ShareLock, false);

! 		if (TransactionIdDidAbort(xid)

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-11 Thread Heikki Linnakangas


Simon Riggs wrote:

There's a confusion in the patch between top level xid and parent xid.
The xl field is named parentxid but actually contains top level. That
distinction is important because the code now uses the top level xid to
locate the recovery proc, formerly the role of the slotid.


True, I changed the meaning halfway through. My thinking was that we can 
get away by only tracking the top-level xact of each subtransaction, not 
the whole tree of subtransactions.



This leads to an error when we SubTransSetParent(child_xid, top_xid);
since this assumes that the top_xid is the parent, which it is not.
Mostly you wouldn't notice unless you were looking up the subtrans
status for an xid that had committed but was the child of an aborted
subtransaction, with the top level xid having > 64 subtransactions.


Hmm. When a subtransaction aborts, RecordTransactionAbort is called and 
clog is updated to show the subtransaction and all it's subcommitted 
children as aborted. I think we're safe, though it wouldn't hurt to 
double-check.


It's an important point that needs documenting, though. I completely 
neglected that.



I'm wasn't looking for ways to reintroduce slotid, but it seems more
logical to keep slotid in light of the above. However, you will probably
view this as intransigence, so I will await your input.


Yeah, it sure does seem like intransigence :-)


For xl_rel_lock you add a field called xid and then ask
/* xid of the *parent* transaction. XXX why parent? */.
You've done this because it replaced slotid. But looking at that, I
think the 6a patch had a bug there: a subtransaction abort record would
release locks for the whole top level xact. So we need to pass both top
level xid (or slotid) and xid for each lock, then release using the
actual xid only.


Hmm, would just the subtransaction xid be enough? If we use that to 
release, what do we need the top-level xid for?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-10 Thread Simon Riggs


On Sat, 2009-01-10 at 11:45 +, Simon Riggs wrote:
> On Sat, 2009-01-10 at 09:40 +, Simon Riggs wrote:
> 
> > This leads to an error when we SubTransSetParent(child_xid, top_xid);
> > since this assumes that the top_xid is the parent, which it is not.
> > Mostly you wouldn't notice unless you were looking up the subtrans
> > status for an xid that had committed but was the child of an aborted
> > subtransaction, with the top level xid having > 64 subtransactions.
> > It's possible the confusion leads to other bugs in UnobservedXid
> > processing, but I didn't look too hard at that.
> > 
> > AFAICS we need both parent and top xids.
> 
> I wonder if its possible to derive the parent by looking at the
> highest/most newly assigned xid? Abort records would remove aborted
> subtransactions and AFAIK we currently assign a new subtransaction only
> ever from the latest current subtransaction. (This wouldn't be
> necessarily true if supported true branch-anywhere subtransactions, but
> we don't). Sounds correct, but not really sure.

Starting to sound like a do-me-later-if-ever optimisation and certainly
nothing I want to rely on in court.

I'm progressing with parent_xid added to the xlog record header, for
now.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-10 Thread Simon Riggs

On Sat, 2009-01-10 at 09:40 +, Simon Riggs wrote:

> This leads to an error when we SubTransSetParent(child_xid, top_xid);
> since this assumes that the top_xid is the parent, which it is not.
> Mostly you wouldn't notice unless you were looking up the subtrans
> status for an xid that had committed but was the child of an aborted
> subtransaction, with the top level xid having > 64 subtransactions.
> It's possible the confusion leads to other bugs in UnobservedXid
> processing, but I didn't look too hard at that.
> 
> AFAICS we need both parent and top xids.

I wonder if its possible to derive the parent by looking at the
highest/most newly assigned xid? Abort records would remove aborted
subtransactions and AFAIK we currently assign a new subtransaction only
ever from the latest current subtransaction. (This wouldn't be
necessarily true if supported true branch-anywhere subtransactions, but
we don't). Sounds correct, but not really sure.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-10 Thread Simon Riggs

On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> Since I've been whining about that for some time, I figured I have to 
> put my money where my mouth is, so here's a patch based on v6a that 
> eliminates the concept of slotids, as well as the new xl_info2 field in 
> XLogRecord. This seems much simpler to me. I haven't given it much 
> testing, but seems to work. There's a whole bunch of comments marked 
> with XXX that need resolving, though.

There's a confusion in the patch between top level xid and parent xid.
The xl field is named parentxid but actually contains top level. That
distinction is important because the code now uses the top level xid to
locate the recovery proc, formerly the role of the slotid.

This leads to an error when we SubTransSetParent(child_xid, top_xid);
since this assumes that the top_xid is the parent, which it is not.
Mostly you wouldn't notice unless you were looking up the subtrans
status for an xid that had committed but was the child of an aborted
subtransaction, with the top level xid having > 64 subtransactions. It's
possible the confusion leads to other bugs in UnobservedXid processing,
but I didn't look too hard at that.

AFAICS we need both parent and top xids. Or put another way, we need the
parent xid and other info that allows us to uniquely determine the proc
we need to update. Now the "other info..." could be top xid or it could
also be slotid, which then avoids later zig-zagging to look up the proc.

I'm wasn't looking for ways to reintroduce slotid, but it seems more
logical to keep slotid in light of the above. However, you will probably
view this as intransigence, so I will await your input.

I'm very happy that GetStandbyInfoForTransaction() and all the XLR2
flags have bitten the dust and will sleep for eternity.

For xl_rel_lock you add a field called xid and then ask
/* xid of the *parent* transaction. XXX why parent? */.
You've done this because it replaced slotid. But looking at that, I
think the 6a patch had a bug there: a subtransaction abort record would
release locks for the whole top level xact. So we need to pass both top
level xid (or slotid) and xid for each lock, then release using the
actual xid only.

You also ask: Shouldn't we call StartupSUBTRANS() and the other startup
functions like we do below, before letting anyone in?

My answer is that the current role of StartupSUBTRANS and friends is not
appropriate at that point, since they zero out those structures. I left
those routines in place thinking "startup" meant "moving to normal
running". If we did have a "startupsubtrans" at the point you note, it
would currently be empty: we don't keep track of the latest page during
recovery. Perhaps we should, but then we'd need to do the equivalent of
ExtendSubtrans etc, which it seemed easier to avoid.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Simon Riggs

On Fri, 2009-01-09 at 11:20 +, Simon Riggs wrote:
> On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> > Simon Riggs wrote:
> > > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> > >> Simon Riggs  writes:
> > >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> >  When a backend dies with FATAL, it writes an abort record before 
> >  exiting.
> > 
> >  (I was under the impression it doesn't until few minutes ago myself, 
> >  when I actually read the shutdown code :-))
> > >>> Not in all cases; keep reading :-)
> > >> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> > >> shared state corrupted, it's only supposed to be a forcible session
> > >> termination.  Any open transaction should be rolled back.
> > > 
> > > Please look back at the earlier discussion we had on this exact point:
> > > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php
> > 
> > I think the running-xacts list we dump to WAL at every checkpoint is 
> > enough to handle that. Just treat the dead transaction as in-progress 
> > until the next running-xacts record. It's presumably extremely rare to 
> > have a process die with FATAL, and not write an abort record.
> 
> I agree, but I'll wait for Tom to speak further.

OK, will proceed without this. Time is pressing.

Heikki and I both agree that FATAL errors that fail to write abort
records are rare and an acceptable problem in real usage. If they do
occur, their nuisance factor is short-lived because of measures taken
within the patch.

Hot Standby does not *rely* upon there always an abort record for FATAL
errors, so we cannot reasonably say the current design would be
"unacceptably fragile" as I had once thought.

So based upon that, out comes the slotid concept and luckily much of the
cruftier aspects of the patch. Less code, probably fewer bugs. Good
thing.

I will produce a new v7 with those changes and merge the changes for v6b
also, so we can begin review again from there. 

Hi ho, hi ho...

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-09 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-09 at 18:30 +0200, Heikki Linnakangas wrote:

How about we refactor things?


Was that a patch against HEAD or a patch on patch?


Against HEAD.


It would be useful to nibble away at the patch, committing all the
necessary refactorings to make the patch work. That will reduce size of
eventual commit.


Agreed. This in particular is a change I feel pretty confident to commit 
beforehand.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-09 Thread Simon Riggs


On Fri, 2009-01-09 at 18:30 +0200, Heikki Linnakangas wrote:
> How about we refactor things?

Was that a patch against HEAD or a patch on patch?

It would be useful to nibble away at the patch, committing all the
necessary refactorings to make the patch work. That will reduce size of
eventual commit.

Do you want me to start using the GIT repo to make it easier to extract
parts? You'd need to show me the setup you use first.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, RestoreBkpBlocks and cleanup locks

2009-01-09 Thread Simon Riggs

On Fri, 2009-01-09 at 18:30 +0200, Heikki Linnakangas wrote:

> The hot standby patch has some hacks to decide which full-page-images 
> can be restored holding an exclusive lock and which ones need a 
> vacuum-strength lock. It's not very pretty as is, as mentioned in 
> comments too.

Agreed!

> How about we refactor things so that redo-functions are responsible for 
> calling RestoreBkpBlocks? The redo function can then pass an argument 
> indicating what kind of lock is required. We should also change 
> XLogReadBufferExtended so that it doesn't lock the page; the caller 
> knows better what kind of a lock it needs. That makes it more analogous 
> with ReadBufferExtended too, although I think we should keep 
> XLogReadBuffer() unchanged for now.

Much better idea, thanks. I felt a new rmgr function was overkill but
couldn't think of how to do this.

> See attached patch. One shortfall of this patch is that you can pass 
> only one argument to RestoreBkpBlocks, but there can multiple backup 
> blocks in one WAL record. That's OK in current usage, though.

If we're doing this because of cleanup locks, then I'd say we don't
currently need a cleanup lock with any WAL record that uses multiple
backup blocks. So we can just document that so anybody adding such a
record in the future will be careful.

So all seems good.

Would you want to push ResolveRedoVisibilityConflicts() down into the
rmgrs as well and make reachedSafeStartPoint a global? That is only
called for cleanup records.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Simon Riggs


On Fri, 2009-01-09 at 14:38 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> >> I mean the standby should stop trying to track the in progress 
> >> transactions in recovery procs, and apply the WAL records like it does 
> >> before the consistent state is reached.
> > 
> > ...
> > 
> > So, if we don't PANIC, how should we behave?
> > 
> > Without full information on running-xacts we would be unable to take a
> > snapshot, so should:
> > * backends be forcibly disconnected?
> > * backends hang waiting for snapshot info to be re-available again in X
> > minutes worth of WAL time?
> > * backends throw an ERROR:  unable to provide snapshot at this time,
> > DETAIL: retry your statement later. 
> > ...other alternatives
> > 
> > and possibly prevent new connections.
> 
> All of those seem reasonable to me. The 2nd option seems nicest, "X 
> minutes" should probably be controlled by max_standby_delay, after which 
> you can throw an error.

Hmm, we use the recovery procs to track transactions that have
TransactionIds assigned. That means we will overflow only if we have
approach 100% write transactions at any time, or if we have more write
transactions in progress than we have max_connections on standby.

So it sounds like the overflow situation would probably be both rare
and, if it did occur, may not occur for long periods.

> If we care enough, we could also keep tracking the transactions in 
> backend-private memory of the startup process, until there's enough room 
> in proc array. That would make the outage shorter, because you wouldn't 
> have to wait until the next running-xacts record, but only until enough 
> transactions have finished that they all fit in proc array again.
> 
> But whatever is the simplest, really.

The above does sound best since it would allow us to have the snapshot
hang for a short period. But at this stage of the game, more complex.

For now though, since it looks like it would happen fairly rarely, I'd
opt for the simplest: throw an ERROR.

> > If max_connections is higher on primary then the standby will *never* be
> > available for querying. Should we have multiple ERRORs depending upon
> > whether the situation is hopefully-temporary or looks-permanent?
> > 
> > Don't assume I want the PANIC. That clearly needs to be revisited if we
> > change slotids. 
> 
> It needs to be revisited whether we change slotids or not, IMHO.
> 
> Note that with slotids, you have a problem as soon as any of the slots 
> that don't exist on standby are used, regardless of how many concurrent 
> transactions there actually is. Without slots you only have a problem if 
>   you really have more than standby's max_connections concurrent 
> transactions. That makes a big difference in practice.

Sometimes, but mostly people set max_connections higher because they
intend to use those extra connections. So no real advantage there
against the slotid approach :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
I mean the standby should stop trying to track the in progress 
transactions in recovery procs, and apply the WAL records like it does 
before the consistent state is reached.


...

So, if we don't PANIC, how should we behave?

Without full information on running-xacts we would be unable to take a
snapshot, so should:
* backends be forcibly disconnected?
* backends hang waiting for snapshot info to be re-available again in X
minutes worth of WAL time?
* backends throw an ERROR:  unable to provide snapshot at this time,
DETAIL: retry your statement later. 
...other alternatives


and possibly prevent new connections.


All of those seem reasonable to me. The 2nd option seems nicest, "X 
minutes" should probably be controlled by max_standby_delay, after which 
you can throw an error.


If we care enough, we could also keep tracking the transactions in 
backend-private memory of the startup process, until there's enough room 
in proc array. That would make the outage shorter, because you wouldn't 
have to wait until the next running-xacts record, but only until enough 
transactions have finished that they all fit in proc array again.


But whatever is the simplest, really.


If max_connections is higher on primary then the standby will *never* be
available for querying. Should we have multiple ERRORs depending upon
whether the situation is hopefully-temporary or looks-permanent?

Don't assume I want the PANIC. That clearly needs to be revisited if we
change slotids. 


It needs to be revisited whether we change slotids or not, IMHO.

Note that with slotids, you have a problem as soon as any of the slots 
that don't exist on standby are used, regardless of how many concurrent 
transactions there actually is. Without slots you only have a problem if 
 you really have more than standby's max_connections concurrent 
transactions. That makes a big difference in practice.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Simon Riggs

On Fri, 2009-01-09 at 13:23 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> >> A related issue is that currently the recovery PANICs if it runs out of 
> >> recovery procs. I think that's not acceptable, regardless of whether we 
> >> use slotids or some other method to avoid it in normal operation, 
> >> because it means you can't recover at all if you set max_connections too 
> >> low in the standby (or in the primary, and you have to recover from 
> >> crash), or you run out of recovery procs because of an abort failed in 
> >> the primary like discussed on that thread. 
> > 
> >> The standby should just 
> >> fast-forward to the next running-xacts record in that case.
> > 
> > What do you mean by "fast forward"?
> 
> I mean the standby should stop trying to track the in progress 
> transactions in recovery procs, and apply the WAL records like it does 
> before the consistent state is reached.

If you say something is not acceptable you need to say what behaviour
you do find acceptable in its place. It's good to come up with new
ideas, but it's not good to ignore the problems the new ideas have. This
is a general point that applies both here and to your proposals with
synch rep. It's much harder to say how it should work in a way that
covers all the requirements and points others have made, otherwise its
just handwaving.

So, if we don't PANIC, how should we behave?

Without full information on running-xacts we would be unable to take a
snapshot, so should:
* backends be forcibly disconnected?
* backends hang waiting for snapshot info to be re-available again in X
minutes worth of WAL time?
* backends throw an ERROR:  unable to provide snapshot at this time,
DETAIL: retry your statement later. 
...other alternatives

and possibly prevent new connections.

If max_connections is higher on primary then the standby will *never* be
available for querying. Should we have multiple ERRORs depending upon
whether the situation is hopefully-temporary or looks-permanent?

Don't assume I want the PANIC. That clearly needs to be revisited if we
change slotids. I just want to make a balanced judgement based upon a
full consideration of the options.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Heikki Linnakangas


Simon Riggs wrote:

On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
A related issue is that currently the recovery PANICs if it runs out of 
recovery procs. I think that's not acceptable, regardless of whether we 
use slotids or some other method to avoid it in normal operation, 
because it means you can't recover at all if you set max_connections too 
low in the standby (or in the primary, and you have to recover from 
crash), or you run out of recovery procs because of an abort failed in 
the primary like discussed on that thread. 


The standby should just 
fast-forward to the next running-xacts record in that case.


What do you mean by "fast forward"?


I mean the standby should stop trying to track the in progress 
transactions in recovery procs, and apply the WAL records like it does 
before the consistent state is reached.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Simon Riggs


On Fri, 2009-01-09 at 12:33 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> >> Simon Riggs  writes:
> >>> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>  When a backend dies with FATAL, it writes an abort record before exiting.
> 
>  (I was under the impression it doesn't until few minutes ago myself, 
>  when I actually read the shutdown code :-))
> >>> Not in all cases; keep reading :-)
> >> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> >> shared state corrupted, it's only supposed to be a forcible session
> >> termination.  Any open transaction should be rolled back.
> > 
> > Please look back at the earlier discussion we had on this exact point:
> > http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php
> 
> I think the running-xacts list we dump to WAL at every checkpoint is 
> enough to handle that. Just treat the dead transaction as in-progress 
> until the next running-xacts record. It's presumably extremely rare to 
> have a process die with FATAL, and not write an abort record.

I agree, but I'll wait for Tom to speak further.

> A related issue is that currently the recovery PANICs if it runs out of 
> recovery procs. I think that's not acceptable, regardless of whether we 
> use slotids or some other method to avoid it in normal operation, 
> because it means you can't recover at all if you set max_connections too 
> low in the standby (or in the primary, and you have to recover from 
> crash), or you run out of recovery procs because of an abort failed in 
> the primary like discussed on that thread. 

> The standby should just 
> fast-forward to the next running-xacts record in that case.

What do you mean by "fast forward"?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Heikki Linnakangas


Simon Riggs wrote:

On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:

Simon Riggs  writes:

On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:

When a backend dies with FATAL, it writes an abort record before exiting.

(I was under the impression it doesn't until few minutes ago myself, 
when I actually read the shutdown code :-))

Not in all cases; keep reading :-)

If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
shared state corrupted, it's only supposed to be a forcible session
termination.  Any open transaction should be rolled back.


Please look back at the earlier discussion we had on this exact point:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php


I think the running-xacts list we dump to WAL at every checkpoint is 
enough to handle that. Just treat the dead transaction as in-progress 
until the next running-xacts record. It's presumably extremely rare to 
have a process die with FATAL, and not write an abort record.


A related issue is that currently the recovery PANICs if it runs out of 
recovery procs. I think that's not acceptable, regardless of whether we 
use slotids or some other method to avoid it in normal operation, 
because it means you can't recover at all if you set max_connections too 
low in the standby (or in the primary, and you have to recover from 
crash), or you run out of recovery procs because of an abort failed in 
the primary like discussed on that thread. The standby should just 
fast-forward to the next running-xacts record in that case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-09 Thread Simon Riggs


On Thu, 2009-01-08 at 15:50 -0500, Tom Lane wrote:
> Simon Riggs  writes:
> > On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> >> When a backend dies with FATAL, it writes an abort record before exiting.
> >> 
> >> (I was under the impression it doesn't until few minutes ago myself, 
> >> when I actually read the shutdown code :-))
> 
> > Not in all cases; keep reading :-)
> 
> If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
> shared state corrupted, it's only supposed to be a forcible session
> termination.  Any open transaction should be rolled back.

Please look back at the earlier discussion we had on this exact point:
http://archives.postgresql.org/pgsql-hackers/2008-09/msg01809.php

Heikki, perhaps now you understand my continued opposition to your ideas
for simplification: I had already thought of them and was forced to rule
them out, not through my own choice. Tom, note that I listen to what you
say and try to write code that meets those requirements.

>From here, I don't mind which way we go. I want code that is as simple
as possible to "do the job", whatever we agree that to be.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Tom Lane

Simon Riggs  writes:
> On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
>> When a backend dies with FATAL, it writes an abort record before exiting.
>> 
>> (I was under the impression it doesn't until few minutes ago myself, 
>> when I actually read the shutdown code :-))

> Not in all cases; keep reading :-)

If it doesn't, that's a bug.  A FATAL exit is not supposed to leave the
shared state corrupted, it's only supposed to be a forcible session
termination.  Any open transaction should be rolled back.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Heikki Linnakangas


Simon Riggs wrote:

On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:

When a backend dies with FATAL, it writes an abort record before exiting.

(I was under the impression it doesn't until few minutes ago myself, 
when I actually read the shutdown code :-))


Not in all cases; keep reading :-)


Want to give a clue? ShutdownPostgres is registered as an on_shmem_exit 
hook in InitPostgres, and ShutdownPostgres calls AbortOutOfAnyTransaction.


PANIC is another story, but in that case the primary will go down, and 
will write a new checkpoint soon after it starts up again.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Simon Riggs


On Thu, 2009-01-08 at 22:31 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * if FATAL errors occur, yet we have long running transactions then we
> > have no way of removing those entries from the recovery procs. Since we
> > have a fixed pool of recovery transactions we won't have anywhere to
> > store that data. Snapshot sizes are fixed maximum with max_connections,
> > so eventually we would not be able to take a snapshot at all, and we'd
> > need to have a "ERROR:  unable to take valid snapshot". 
> 
> When a backend dies with FATAL, it writes an abort record before exiting.
> 
> (I was under the impression it doesn't until few minutes ago myself, 
> when I actually read the shutdown code :-))

Not in all cases; keep reading :-)

The good thing is we have a choice now as to whether we care about that,
or not.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Heikki Linnakangas


Simon Riggs wrote:

If you want to do things a different way you need to say what you want
to do and what effects those changes will have. 


I want to reduce the coupling between the primary and the master. The 
less they need to communicate, the better. I want to get rid of slotid, 
and as many of the other extra information carried in WAL records that I 
can. I believe that will make the patch both simpler and more robust.



Are there tradeoffs? If so what are they?


I don't think there's any big difference in user-visible behavior. 
RecordKnownAssignedTransactionId now needs to be called for every xlog 
record as opposed to just the first record where an xid appears, because 
I eliminated the hint flag in WAL to mark those records. And it needs to 
look up the recover proc by xid, instead of using the slot id. But I 
don't think that will have a significant impact on performance.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Heikki Linnakangas


Simon Riggs wrote:

* if FATAL errors occur, yet we have long running transactions then we
have no way of removing those entries from the recovery procs. Since we
have a fixed pool of recovery transactions we won't have anywhere to
store that data. Snapshot sizes are fixed maximum with max_connections,
so eventually we would not be able to take a snapshot at all, and we'd
need to have a "ERROR:  unable to take valid snapshot". 


When a backend dies with FATAL, it writes an abort record before exiting.

(I was under the impression it doesn't until few minutes ago myself, 
when I actually read the shutdown code :-))



* if FATAL errors occur while holding AccessExclusiveLock then we have
no way of releasing those locks until the recovery proc is stale, which
might be some time. Not sure if your patch releases those?


I don't think so, that needs to be fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Simon Riggs

On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> I've mentioned before that I don't like the slotid stuff. From an 
> architectural point of view, we should try to keep the extra 
> communication between primary and standby to a minimum, for the sake of 
> robustness. The primary shouldn't have such a big role in keeping track 
> of which xids it has already emitted in WAL records, which subxids have 
> been marked, and so on.

Removing slotid does improve the code, I agree. It was never there for
reasons other than the functions and features it provides.

Effects of removing slotid are at least these

* if FATAL errors occur, yet we have long running transactions then we
have no way of removing those entries from the recovery procs. Since we
have a fixed pool of recovery transactions we won't have anywhere to
store that data. Snapshot sizes are fixed maximum with max_connections,
so eventually we would not be able to take a snapshot at all, and we'd
need to have a "ERROR:  unable to take valid snapshot". 

* if FATAL errors occur while holding AccessExclusiveLock then we have
no way of releasing those locks until the recovery proc is stale, which
might be some time. Not sure if your patch releases those?

* xid->proc lookup is now very slow and needs to be called more
frequently; will still be slower even with the further optimisations you
mention. Possibly a minor point with right tuning.

* slotid removed from xlrec header; makes no difference with 64-bit
systems because of alignment issues.

...perhaps more, not sure yet.

All we need to do is decide if we want these things or not. If not, then
we can remove the slotid stuff.

If it wasn't clear before, this was, for me, never a discussion about a
particular way of coding things. I care little for that and would
probably go with your suggestions more often than not. It's just simply
about what we want it to do. If you want to argue in favour of
restrictions to make things simpler, that's OK and I could probably live
with them quite happily myself. I'm trying to implement the project
philosophy of It Just Works, no restrictions or caveats - and I know if
I had not then the patch would have been rejected on that basis, as has
happened many times before.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, slot ids and stuff

2009-01-08 Thread Simon Riggs


On Thu, 2009-01-08 at 20:30 +0200, Heikki Linnakangas wrote:

> I've mentioned before that I don't like the slotid stuff. From an 
> architectural point of view, we should try to keep the extra 
> communication between primary and standby to a minimum, for the sake of 
> robustness. The primary shouldn't have such a big role in keeping track 
> of which xids it has already emitted in WAL records, which subxids have 
> been marked, and so on.
> 
> Since I've been whining about that for some time, I figured I have to 
> put my money where my mouth is

Yes, you do, but I think not like this. Turning up with a patch and
saying "my way is better" but not actually saying what that way *is*
just confuses things.

If you want to do things a different way you need to say what you want
to do and what effects those changes will have. Are there tradeoffs? If
so what are they? ISTM easier to discuss them on list first than to
write a load of code and ask us all to read, understand and comment.

> , so here's a patch based on v6a that 
> eliminates the concept of slotids, as well as the new xl_info2 field in 
> XLogRecord. This seems much simpler to me. I haven't given it much 
> testing, but seems to work. There's a whole bunch of comments marked 
> with XXX that need resolving, though.
> 
> Attached is a patch agains CVS HEAD (hs-noslotid-1.patch) as well as an 
> incremental patch against your v6a (hs-noslotid-against-v6a.patch). 
> Sorry if you were just working on some big changes and I joggled your 
> elbow. I also didn't check what changes you had in v7 yet.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-31 Thread Simon Riggs

On Tue, 2008-12-30 at 18:31 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > (a) always ignore LP_DEAD flags we see when reading index during
> > recovery.
> 
> This sounds simplest, and it's nice to not clear the flags for the 
> benefit of transactions running after the recovery is done.

Agreed.

(Also: Transaction hint bits are always set correctly, because we would
only ever see a full page write with hints set after the commit/abort
record was processed. So I continue to honour transaction hint bit
reading and setting during recovery).

> You have to be careful to ignore the flags in read-only transactions 
> that started in hot standby mode, even if recovery has since ended and 
> we're in normal operation now.

Got that.

I'm setting ignore_killed_tuples = false at the start of any index scan
during recovery. And kill_prior_tuples is never set true when in
recovery. Both measures are AM-agnostic.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Heikki Linnakangas


Simon Riggs wrote:

(a) always ignore LP_DEAD flags we see when reading index during
recovery.


This sounds simplest, and it's nice to not clear the flags for the 
benefit of transactions running after the recovery is done.


You have to be careful to ignore the flags in read-only transactions 
that started in hot standby mode, even if recovery has since ended and 
we're in normal operation now.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Heikki Linnakangas


Simon Riggs wrote:

Issues (2) and (3) would go away entirely if both standby and primary
always had the same xmin value as a system-wide setting. i.e. the
standby and primary are locked together at their xmins. Perhaps that was
Heikki's intention in recent suggestions? 


No, I only suggested that as an optional optimization. We can't rely on 
it, because the queries on standby should still work correctly if the 
connection to primary is lost for some reason, or if the primary decides 
not to honor standby's xmin, perhaps to avoid the usual issues with 
long-running-transactions.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-30 Thread Simon Riggs

On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote:

> I'm confused shouldn't read-only transactions on the slave just be  
> hacked to not set any hint bits including lp_delete?

It seems there are multiple issues involved and I saw only the first of
these initially. I want to explicitly separate these issues so we can
discuss them more easily.

1. When we replay an XLOG_BTREE_DELETE record, we may have to
wait-then-cancel-etc other sessions.

Possibly a pain, but these records are not very common now that we have
HOT, except on certain kinds of queue table.

2. Should we ignore the LP_DEAD flag on btree rows when we are using the
index during recovery? As Heikki points out, this hint bit is not WAL
logged, but can appear in the standby as a result of full page writes.
The LP_DEAD flags will have been set using a different xmin to the one
on the standby and would cause index rows to be ignored that should have
been included in a correct MVCC answer.

So we need to either

(a) always ignore LP_DEAD flags we see when reading index during
recovery.

(b) include an additional step to clean the full page writes to remove
LP_DEAD hints from the incoming pages.

(b) is feasible, but would need to be repeated each time a new full page
arrived, so a page may need to be re-cleaned many times. Sounds like a
bad plan, so we should choose (a).

3. Should we set LP_DELETE flag on btree rows when we are using the
index during recovery? Not much point if we are ignoring them.

There is no space for an additional flag, to distinguish between primary
and standby hint bits.

Issues (2) and (3) would go away entirely if both standby and primary
always had the same xmin value as a system-wide setting. i.e. the
standby and primary are locked together at their xmins. Perhaps that was
Heikki's intention in recent suggestions? 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-29 Thread Heikki Linnakangas


marcin mank wrote:

Perhaps we should listen to the people that have said they don't want
queries cancelled, even if the alternative is inconsistent answers.


I don't like that much. PostgreSQL has traditionally avoided that very 
hard. It's hard to tell what kind of inconsistencies you'd get, as it'd 
depend on what plan is created, when a vacuum happens to run on master etc.



I think an alternative to that would be "if the wal backlog is too
big, let current queries finish and let incoming queries wait till the
backlog gets smaller".


Yeah, that makes sense too.

Many approaches have been proposed, and they all have different 
tradeoffs and therefore fit different use cases. I'm not sure which ones 
are/will be included in the patch. We don't need all in 8.4, one or two 
simplest ones will do just fine, and we can extend later.


Let me summarize. Whenever a WAL record conflicts with a 
query-in-progress, we can:


1. kill the query, or
2. wait for the query to finish
3. let the query proceed, producing invalid results.

There's some combinations of those as well. You're proposal is a 
variation of 2, to avoid the problem of WAL application falling behind 
indefinitely. There's also the max_standby_delay option in the patch, to 
wait a while, and then kill the query.


There's some additional optimizations that can be made to make those 
options less painful. Instead of killing all queries that might be 
affected by a vacuum record, only kill them when they actually hit a 
block that was vacuumed (Simon's idea of latestRemovedLSN field in page 
header).


Another line of attack is to avoid getting into the situation in the 
first place, by affecting behavior on the master. If the standby has an 
online connection to the master (per the synch rep patch), it can tell 
master what the slave's OldestXmin is, and master can take that into 
account and not remove tuples still needed by the slave. That's not good 
from high availability point of view, you don't want a hung query in the 
slave to cause a long-running-transaction situation in the master, but 
for other use cases it would be fine. Or we can just add a constant # of 
transactions to OldestXmin in master, to get some breathing room in the 
server.


The bottom line is that we have enough options to make everyone happy. 
Some understanding of the issue is required to tune it properly, 
however, so documentation is important.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-25 Thread marcin mank

> Perhaps we should listen to the people that have said they don't want
> queries cancelled, even if the alternative is inconsistent answers.

I think an alternative to that would be "if the wal backlog is too
big, let current queries finish and let incoming queries wait till the
backlog gets smaller".

fell free to ignore me, as a non-hacker I`m not even supposed to be
reading this list :-]

Greetings
Marcin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee

On Wed, Dec 24, 2008 at 7:18 PM, Simon Riggs  wrote:
>
>
>
> With respect, I was hoping you might look in the patch and see if you
> agree with the way it is handled. No need to remember. The whole
> latestRemovedXid concept is designed to do help.
>

Well, that's common for all cleanup record including vacuum. But
reading your comment, it seemed as there is something special to
handle HOT prune case which I did not see. Anyways, the trouble with
HOT prune is that uples may be cleaned up very frequently and that can
lead to query cancellation at the standby. That's what I wanted to
emphasize.

>
> Queries get cancelled if data they need to see if removed and the
> max_standby_delay expires. So lag will be max_standby_delay, by
> definition.

That's per cleanup record, isn't it ?

> We've also discussed storing lastCleanedLSN for each buffer, so queries
> can cancel themselves if they need to read a buffer that has had data
> removed from it that they would have needed to see. I'll write that up
> also.
>

What if we do that at table level ? So if a query touches a table
which had cleanup activity since the query was started, it cancels
itself automatically,

Happy X'mas to all of you!

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs


On Wed, 2008-12-24 at 09:59 -0500, Robert Treat wrote:

> I think the uncertainty comes from peoples experience with typical 
> replication 
> use cases vs a lack of experience with this current implementation.  

Quite possibly.

Publishing user feedback on this will be very important in making this a
usable feature.

I'd be very happy if you were to direct the search for optimal
usability.

> One such example is that it is pretty common to use read-only slaves to do 
> horizontal scaling of read queries across a bunch of slaves. This is not the 
> scenario of running reporting queries on a second machine to lower load; you 
> would be running a large number of read-only, relativly short, oltp-ish 
> queries (think pg_benchs select only test i suppose), but you also have a 
> fairly regular stream of inserts/updates going on with these same tables, its 
> just you have 95/5 split of read/write (or similar). 

One thing to consider also is latency of information. Sending queries to
master or slave may return different answers if querying very recent
data.

> This is standard practice in things like mysql or using slony or what have 
> you. I suspect it's one of the first things people are going to want to do 
> with hot standby. But it's unclear how well this will work because we don't 
> have any experience with it yet, coupled with the two downsides being 
> mentioned as canceled queries and replay lag, which happen to be probably the 
> two worst downsides you would have in the above scenario. :-)
> 
> Hmm I'm not sure why I didn't think of running this test before, but  
> read/write pg_bench on a master with pg_bench select test on slave isn't that 
> bad of a scenario to match the above; it might be a little too much activity 
> on the master, but has anyone else run such a test? 
> 
-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Robert Treat

On Wednesday 24 December 2008 08:48:04 Simon Riggs wrote:
> On Wed, 2008-12-24 at 17:56 +0530, Pavan Deolasee wrote:
> > Again, I haven't seen how frequently queries may get canceled. Or if
> > the delay is set to a large value, how far behind standby may get
> > during replication, so I can't really comment. Have you done any tests
> > on a reasonable hardware and checked if moderately long read queries
> > can be run on standby without standby lagging behind a lot.
>
> Queries get cancelled if data they need to see if removed and the
> max_standby_delay expires. So lag will be max_standby_delay, by
> definition.
>
> Not sure what further tests would show. Queries that run for longer than
> max_standby delay plus mean time between cleanup records will currently
> end up being cancelled.
>

I think the uncertainty comes from peoples experience with typical replication 
use cases vs a lack of experience with this current implementation.  

One such example is that it is pretty common to use read-only slaves to do 
horizontal scaling of read queries across a bunch of slaves. This is not the 
scenario of running reporting queries on a second machine to lower load; you 
would be running a large number of read-only, relativly short, oltp-ish 
queries (think pg_benchs select only test i suppose), but you also have a 
fairly regular stream of inserts/updates going on with these same tables, its 
just you have 95/5 split of read/write (or similar). 

This is standard practice in things like mysql or using slony or what have 
you. I suspect it's one of the first things people are going to want to do 
with hot standby. But it's unclear how well this will work because we don't 
have any experience with it yet, coupled with the two downsides being 
mentioned as canceled queries and replay lag, which happen to be probably the 
two worst downsides you would have in the above scenario. :-)

Hmm I'm not sure why I didn't think of running this test before, but  
read/write pg_bench on a master with pg_bench select test on slave isn't that 
bad of a scenario to match the above; it might be a little too much activity 
on the master, but has anyone else run such a test? 

-- 
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs

On Wed, 2008-12-24 at 17:56 +0530, Pavan Deolasee wrote:
> On Wed, Dec 24, 2008 at 5:26 PM, Simon Riggs  wrote:
> >
> 
> >
> > The patch does go to some trouble to handle that case, as I'm sure
> > you've seen. Are you saying that part of the patch is ineffective and
> > should be removed, or?
> >
> 
> Umm.. are you talking about the "wait" mechanism ? That's the only
> thing I remember. Otherwise, prune record is pretty much same as any
> vacuum cleanup record.

With respect, I was hoping you might look in the patch and see if you
agree with the way it is handled. No need to remember. The whole
latestRemovedXid concept is designed to do help.

> > Should/could there be a way to control frequency of prune operations? We
> > could maintain cleanupxmin as a constant minimum distance from xmax, for
> > example.
> >
> 
> Well, there can be. But tuning any such thing might be difficult and
> would have implications on the primary. I am not saying we can do
> that, but we will need additional tests to see its impact.
> 
> > Are we saying we should take further measures, as I asked upthread? If
> > it is a consensus that I take some action, then I will.
> >
> 
> Again, I haven't seen how frequently queries may get canceled. Or if
> the delay is set to a large value, how far behind standby may get
> during replication, so I can't really comment. Have you done any tests
> on a reasonable hardware and checked if moderately long read queries
> can be run on standby without standby lagging behind a lot.

Queries get cancelled if data they need to see if removed and the
max_standby_delay expires. So lag will be max_standby_delay, by
definition.

Not sure what further tests would show. Queries that run for longer than
max_standby delay plus mean time between cleanup records will currently
end up being cancelled.

> I would prefer to have a solution which can be smarter than canceling
> all queries as soon as a cleanup record comes and timeout occurs. 

Currently, it was the consensus view that queries should be cancelled,
though there are other options still on the table.

It's discussed in Design Notes on the Wiki. "Simply ignoring WAL removal
has been discussed and rejected (so far).
http://archives.postgresql.org/pgsql-hackers/2008-05/msg00753.php
Explicitly defining the tables a transaction wishes to see has also been
discussed and rejected (so far).
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00268.php";

> For
> example, if the queries are being run on a completely different set of
> tables where as the updates/deletes are happening on another set of
> tables, there is no reason why those queries should be canceled. I
> think it would be very common to have large history tables which may
> receive long read-only queries, but no updates/deletes. Whereas other
> frequently updated tables which receive very few queries on the
> standby.

There is currently no way to tell which tables a query will touch during
the course of its execution. Nor is there likely to be because of
user-defined volatile functions.

I attempted to find ways to explicitly limit the set of tables over
which a query might venture, but that cam to nothing also.

We've also discussed storing lastCleanedLSN for each buffer, so queries
can cancel themselves if they need to read a buffer that has had data
removed from it that they would have needed to see. I'll write that up
also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee

On Wed, Dec 24, 2008 at 5:26 PM, Simon Riggs  wrote:
>

>
> The patch does go to some trouble to handle that case, as I'm sure
> you've seen. Are you saying that part of the patch is ineffective and
> should be removed, or?
>

Umm.. are you talking about the "wait" mechanism ? That's the only
thing I remember. Otherwise, prune record is pretty much same as any
vacuum cleanup record.

> Should/could there be a way to control frequency of prune operations? We
> could maintain cleanupxmin as a constant minimum distance from xmax, for
> example.
>

Well, there can be. But tuning any such thing might be difficult and
would have implications on the primary. I am not saying we can do
that, but we will need additional tests to see its impact.

> Are we saying we should take further measures, as I asked upthread? If
> it is a consensus that I take some action, then I will.
>

Again, I haven't seen how frequently queries may get canceled. Or if
the delay is set to a large value, how far behind standby may get
during replication, so I can't really comment. Have you done any tests
on a reasonable hardware and checked if moderately long read queries
can be run on standby without standby lagging behind a lot.

I would prefer to have a solution which can be smarter than canceling
all queries as soon as a cleanup record comes and timeout occurs. For
example, if the queries are being run on a completely different set of
tables where as the updates/deletes are happening on another set of
tables, there is no reason why those queries should be canceled. I
think it would be very common to have large history tables which may
receive long read-only queries, but no updates/deletes. Whereas other
frequently updated tables which receive very few queries on the
standby.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs

On Wed, 2008-12-24 at 16:48 +0530, Pavan Deolasee wrote:
> On Wed, Dec 24, 2008 at 4:41 PM, Simon Riggs  wrote:
> >
> >
> > Greg and Heikki have highlighted in this thread some aspects of btree
> > garbage collection that will increase the chance of queries being
> > cancelled in various circumstances
> 
> Even HOT-prune may lead to frequent query cancellations and unlike
> VACUUM there is no way user can control the frequency of prune
> operations.

The patch does go to some trouble to handle that case, as I'm sure
you've seen. Are you saying that part of the patch is ineffective and
should be removed, or?

Should/could there be a way to control frequency of prune operations? We
could maintain cleanupxmin as a constant minimum distance from xmax, for
example.

Are we saying we should take further measures, as I asked upthread? If
it is a consensus that I take some action, then I will.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Pavan Deolasee

On Wed, Dec 24, 2008 at 4:41 PM, Simon Riggs  wrote:
>
>
> Greg and Heikki have highlighted in this thread some aspects of btree
> garbage collection that will increase the chance of queries being
> cancelled in various circumstances

Even HOT-prune may lead to frequent query cancellations and unlike
VACUUM there is no way user can control the frequency of prune
operations.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-24 Thread Simon Riggs

On Tue, 2008-12-23 at 23:59 -0500, Robert Treat wrote:
> On Friday 19 December 2008 19:36:42 Simon Riggs wrote:
> > Perhaps we should listen to the people that have said they don't want
> > queries cancelled, even if the alternative is inconsistent answers. That
> > is easily possible yet is not currently an option. Plus we have the
> > option I referred to up thread, which is to defer query cancel until the
> > query reads a modified data block. I'm OK with implementing either of
> > those, as non-default options. Do we need those options or are we ok?
> >
> 
> Haven't seen any feed back on this, but I think the two options of cancel 
> query for replay, and pause replay for queries, are probably enough for a 
> first go around (especially if you can get the query canceling to work only 
> when changes are made to the specific database in question)

Thanks for picking up on this. This question is the #1 usability issue
for Hot Standby, since at least May 2008. There are many potential
additions and we need to track this carefully over the next few months
to see if we have it just right. I'll take viewpoints at any time on
that; this door is never closed, though tempus fugit.

Greg and Heikki have highlighted in this thread some aspects of btree
garbage collection that will increase the chance of queries being
cancelled in various circumstances. If this is important enough to
trigger additional actions then we need to highlight that now so we have
time to take those corrective actions. 

I've listened to many different viewpoints on and off list. Everybody
takes a slightly different angle on it and I'm in favour of giving
everybody what they want with the right set of options.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Robert Treat

On Friday 19 December 2008 19:36:42 Simon Riggs wrote:
> Perhaps we should listen to the people that have said they don't want
> queries cancelled, even if the alternative is inconsistent answers. That
> is easily possible yet is not currently an option. Plus we have the
> option I referred to up thread, which is to defer query cancel until the
> query reads a modified data block. I'm OK with implementing either of
> those, as non-default options. Do we need those options or are we ok?
>

Haven't seen any feed back on this, but I think the two options of cancel 
query for replay, and pause replay for queries, are probably enough for a 
first go around (especially if you can get the query canceling to work only 
when changes are made to the specific database in question)

-- 
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Robert Treat

On Saturday 20 December 2008 04:10:21 Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Sat, 2008-12-20 at 09:21 +0200, Heikki Linnakangas wrote:
> >> Gregory Stark wrote:
> >>> Simon Riggs  writes:
>  Increasing the waiting time increases the failover time and thus
>  decreases the value of the standby as an HA system. Others value high
>  availability higher than you and so we had agreed to provide an option
>  to allow the max waiting time to be set.
> >>>
> >>> Sure, it's a nice option to have. But I think the default should be to
> >>> pause WAL replay.
> >>
> >> I think I agree that pausing should be the default. If for no other
> >> reason, because I can't think of a good default for max_standby_delay.
> >
> > I would rather err on the side of caution. If we do as you suggest,
> > somebody will lose their database and start shouting "stupid default".
>
> Even if we stop applying the WAL, it should still be archived safely,
> right? So no data should be lost, although the standby can fall very
> much behind, and it can take a while to catch up.
>

I was thinking the condition Simon was concerned about was that on a very busy 
slave with wal delay, you could theoretically fill up the disks and destroy 
the slave. With query cancel, you might be annoyed to see the queries 
canceled, but theres no way that you would destroy the slave. (That might not 
have been what he meant though)

-- 
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs

On Fri, 2008-12-19 at 14:23 -0600, Kevin Grittner wrote:

> > I guess making it that SQLSTATE would make it simpler to understand
> why
> > the error occurs and also how to handle it (i.e. resubmit).
>  
> Precisely.

Just confirming I will implement the SQLSTATE as requested.

I recognize my own and Greg's arguments that the match is not perfect,
but the Standard isn't clear on this and Kevin's interpretation is the
more useful behaviour for developers.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs

On Sat, 2008-12-20 at 20:09 -0300, Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
> > Gregory Stark wrote:
> >> A vacuum being replayed -- even in a different database -- could trigger 
> >> the
> >> error. Or with the btree split issue, a data load -- again even in a 
> >> different
> >> database -- would be quite likely cause your SELECT to be killed.
> >
> > Hmm, I wonder if we should/could track the "latestRemovedXid" separately  
> > for each database. There's no reason why we need to kill a read-only  
> > query in database X when a table in database Y is vacuumed.
> 
> What about shared catalogs?

Hot Standby already covers this special case. Patch uses
GetCurrentVirtualXIDs(), which treats a dbOid of 0 to match against all
databases.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-23 Thread Simon Riggs

On Sat, 2008-12-20 at 22:07 +0200, Heikki Linnakangas wrote:
> Gregory Stark wrote:
> > A vacuum being replayed -- even in a different database -- could trigger the
> > error. Or with the btree split issue, a data load -- again even in a 
> > different
> > database -- would be quite likely cause your SELECT to be killed.
> 
> Hmm, I wonder if we should/could track the "latestRemovedXid" separately 
> for each database. There's no reason why we need to kill a read-only 
> query in database X when a table in database Y is vacuumed.

Already implemented in code. see ResolveRedoVisibilityConflicts()

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-21 Thread Alvaro Herrera

Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
>> Heikki Linnakangas wrote:
>>> Gregory Stark wrote:
 A vacuum being replayed -- even in a different database -- could trigger 
 the
 error. Or with the btree split issue, a data load -- again even in a 
 different
 database -- would be quite likely cause your SELECT to be killed.
>>> Hmm, I wonder if we should/could track the "latestRemovedXid" 
>>> separately  for each database. There's no reason why we need to kill 
>>> a read-only  query in database X when a table in database Y is 
>>> vacuumed.
>>
>> What about shared catalogs?
>
> True, vacuums on shared catalogs would affect read-only queries on all  
> databases.

Maybe it's possible to track latestRemovedXid for each database, and
additionally another counter that tracks vacuums on shared catalogs.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and b-tree killed items

2008-12-21 Thread Heikki Linnakangas


Alvaro Herrera wrote:

Heikki Linnakangas wrote:

Gregory Stark wrote:

A vacuum being replayed -- even in a different database -- could trigger the
error. Or with the btree split issue, a data load -- again even in a different
database -- would be quite likely cause your SELECT to be killed.
Hmm, I wonder if we should/could track the "latestRemovedXid" separately  
for each database. There's no reason why we need to kill a read-only  
query in database X when a table in database Y is vacuumed.


What about shared catalogs?


True, vacuums on shared catalogs would affect read-only queries on all 
databases.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 4 5 6 7 8 9 10 >

801 - 900 of 971 matches

Mail list logo