Re: [HACKERS] Hot standby and b-tree killed items

2008-12-20 Thread Alvaro Herrera
Heikki Linnakangas wrote:
 Gregory Stark wrote:
 A vacuum being replayed -- even in a different database -- could trigger the
 error. Or with the btree split issue, a data load -- again even in a 
 different
 database -- would be quite likely cause your SELECT to be killed.

 Hmm, I wonder if we should/could track the latestRemovedXid separately  
 for each database. There's no reason why we need to kill a read-only  
 query in database X when a table in database Y is vacuumed.

What about shared catalogs?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 10:49 +0200, Heikki Linnakangas wrote:

 Whenever a B-tree index scan fetches a heap tuple that turns out to be 
 dead, the B-tree item is marked as killed by calling _bt_killitems. When 
 the page gets full, all the killed items are removed by calling 
 _bt_vacuum_one_page.
 
 That's a problem for hot standby. If any of the killed b-tree items 
 point to a tuple that is still visible to a running read-only 
 transaction, we have the same situation as with vacuum, and have to 
 either wait for the read-only transaction to finish before applying the 
 WAL record or kill the transaction.
 
 It looks like there's some cosmetic changes related to that in the 
 patch, the signature of _bt_delitems is modified, but there's no actual 
 changes that would handle that situation. I didn't see it on the TODO on 
 the hot standby wiki either. Am I missing something, or the patch?

ResolveRedoVisibilityConflicts() describes the current patch's position
on this point, which on review is wrong, I agree.

It looks like I assumed that _bt_delitems is only called during VACUUM,
which I knew it wasn't. I know I was going to split XLOG_BTREE_VACUUM
into two record types at one point, one for delete, one for vacuum. In
the end I didn't. Anyhow, its wrong.

We have infrastructure in place to make this work correctly, just need
to add latestRemovedXid field to xl_btree_vacuum. So that part is easily
solved.

Thanks for spotting it. More like that please!

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas

Simon Riggs wrote:

We have infrastructure in place to make this work correctly, just need
to add latestRemovedXid field to xl_btree_vacuum. So that part is easily
solved.


That's tricky because there's no xmin/xmax on index tuples. You could 
conservatively use OldestXmin as latestRemovedXid, but that could stall 
the WAL redo a lot more than necessary. Or you could store 
latestRemovedXid in the page header, but that would need to be 
WAL-logged to ensure that it's valid after crash. Or you could look at 
the heap to fetch the xmin/xmax, but that would be expensive.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 12:24 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  We have infrastructure in place to make this work correctly, just need
  to add latestRemovedXid field to xl_btree_vacuum. So that part is easily
  solved.
 
 That's tricky because there's no xmin/xmax on index tuples.

Doh. 

  You could 
 conservatively use OldestXmin as latestRemovedXid, but that could stall 
 the WAL redo a lot more than necessary. Or you could store 
 latestRemovedXid in the page header, but that would need to be 
 WAL-logged to ensure that it's valid after crash. Or you could look at 
 the heap to fetch the xmin/xmax, but that would be expensive.

Agreed. Probably need to use OldestXmin then.



If I was going to add anything to the btree page header, it would be
latestRemovedLSN, only set during recovery. That way we don't have to
explicitly kill queries, we can do the a wait on OldestXmin then let
them ERROR out when they find a page that has been modified.

I have a suspicion that we may need some modification of that solution
for all data blocks, so we don't kill too many queries.

Hmmm. I wonder if we can track latestRemovedLSN for all of
shared_buffers. That was initially rejected, but if we set the
latestRemovedLSN to be the block's LSN when we read it in, that would be
fairly useful. Either way we use 8 bytes RAM per buffer.



BTW, I noticed the other day that Oracle 11g only allows you to have a
read only slave *or* allows you to continue replaying. You need to
manually switch back and forth between those modes. They can't do
*both*, as Postgres will be able to do. That's because their undo
information is stored off-block in the Undo Tablespace, so is not
available for standby queries. Nice one, Postgres.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Greg Stark
I'm confused shouldn't read-only transactions on the slave just be  
hacked to not set any hint bits including lp_delete?


--
Greg


On 19 Dec 2008, at 03:49, Heikki Linnakangas heikki.linnakan...@enterprisedb.com 
 wrote:


Whenever a B-tree index scan fetches a heap tuple that turns out to  
be dead, the B-tree item is marked as killed by calling  
_bt_killitems. When the page gets full, all the killed items are  
removed by calling _bt_vacuum_one_page.


That's a problem for hot standby. If any of the killed b-tree items  
point to a tuple that is still visible to a running read-only  
transaction, we have the same situation as with vacuum, and have to  
either wait for the read-only transaction to finish before applying  
the WAL record or kill the transaction.


It looks like there's some cosmetic changes related to that in the  
patch, the signature of _bt_delitems is modified, but there's no  
actual changes that would handle that situation. I didn't see it on  
the TODO on the hot standby wiki either. Am I missing something, or  
the patch?


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 09:22 -0500, Greg Stark wrote:

 I'm confused shouldn't read-only transactions on the slave just be  
 hacked to not set any hint bits including lp_delete?

They could be, though I see no value in doing so. 

But that is not Heikki's point. He is discussing what happens on the
primary and the effects that must then occur on the standby. He has
rightly pointed out a (pluggable) hole in my logic.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Simon Riggs si...@2ndquadrant.com wrote: 
 
 If I was going to add anything to the btree page header, it would be
 latestRemovedLSN, only set during recovery. That way we don't have
to
 explicitly kill queries, we can do the a wait on OldestXmin then let
 them ERROR out when they find a page that has been modified.
 
 I have a suspicion that we may need some modification of that
solution
 for all data blocks, so we don't kill too many queries.
 
If the failure is caused by the timing of various database
transactions, and the query is likely to run successfully after a
delay and a retry, please use SQLSTATE of '40001'.  Some software
(ours, for one) will recognize this and retry the query automatically,
so that the user impact is essentially the same as blocking.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 11:54 -0600, Kevin Grittner wrote:
  Simon Riggs si...@2ndquadrant.com wrote: 
  
  If I was going to add anything to the btree page header, it would be
  latestRemovedLSN, only set during recovery. That way we don't have
 to
  explicitly kill queries, we can do the a wait on OldestXmin then let
  them ERROR out when they find a page that has been modified.
  
  I have a suspicion that we may need some modification of that
 solution
  for all data blocks, so we don't kill too many queries.
  
 If the failure is caused by the timing of various database
 transactions, and the query is likely to run successfully after a
 delay and a retry, please use SQLSTATE of '40001'.  Some software
 (ours, for one) will recognize this and retry the query automatically,
 so that the user impact is essentially the same as blocking.

I understand the need, but we won't be using SQLSTATE = 40001.

That corresponds to ERRCODE_T_R_SERIALIZATION_FAILURE, which that error
would not be.

The error message ought to be snapshot too old, which could raise a
chuckle, so I called it something else.

The point you raise is a good one and I think we should publish a list
of retryable error messages. I contemplated once proposing a special log
level for a retryable error, but not quite a good idea.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 10:52 +, Simon Riggs wrote:

   You could 
  conservatively use OldestXmin as latestRemovedXid, but that could stall 
  the WAL redo a lot more than necessary. Or you could store 
  latestRemovedXid in the page header, but that would need to be 
  WAL-logged to ensure that it's valid after crash. Or you could look at 
  the heap to fetch the xmin/xmax, but that would be expensive.
 
 Agreed. Probably need to use OldestXmin then.

Just finished coding this up, plus TODO item to pin every index page.
Will post after some further testing.

Used RecentOldestXmin.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Simon Riggs si...@2ndquadrant.com wrote: 
 
 I understand the need, but we won't be using SQLSTATE = 40001.
 
 That corresponds to ERRCODE_T_R_SERIALIZATION_FAILURE, which that
error
 would not be.
 
Isn't it a problem with serialization of database transactions?  You
hit it in a different way, but if it is a temporary failure due to the
timing of the transactions, I strongly feel that that is the correct
SQLSTATE to use.  Perhaps more information to provide any useful
context could be in the info or hint areas?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Simon Riggs si...@2ndquadrant.com writes:

 The error message ought to be snapshot too old, which could raise a
 chuckle, so I called it something else.

 The point you raise is a good one and I think we should publish a list
 of retryable error messages. I contemplated once proposing a special log
 level for a retryable error, but not quite a good idea.

I'm a bit concerned about the idea of killing off queries to allow WAL to
proceed. While I have nothing against that being an option I think we should
be aiming to make it not necessary for correctness and not the default. By
default I think WAL replay should stick to stalling WAL replay and only resort
to killing queries if the user specifically requests it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 18:59 +, Gregory Stark wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 
  The error message ought to be snapshot too old, which could raise a
  chuckle, so I called it something else.
 
  The point you raise is a good one and I think we should publish a list
  of retryable error messages. I contemplated once proposing a special log
  level for a retryable error, but not quite a good idea.
 
 I'm a bit concerned about the idea of killing off queries to allow WAL to
 proceed. While I have nothing against that being an option I think we should
 be aiming to make it not necessary for correctness and not the default. By
 default I think WAL replay should stick to stalling WAL replay and only resort
 to killing queries if the user specifically requests it.

Increasing the waiting time increases the failover time and thus
decreases the value of the standby as an HA system. Others value high
availability higher than you and so we had agreed to provide an option
to allow the max waiting time to be set.

max_standby_delay is set in recovery.conf, value 0 (forever) - 2,000,000
secs, settable in milliseconds. So think of it like a deadlock detector
for recovery apply.

Also, there is a set of functions to control the way recovery proceeds,
much as you might control an MP3 player (start, stop, pause). There ares
also functions to pause at specific xids, pause at specific time, pause
at the next cleanup record. That allows you to set the max_standby_delay
lower and then freeze the server for longer to run a long query if
required. It also allows you to do PITR by trial and error rather than
one shot specify-in-advance settings. There is a function to manually
end recovery at a useful place if desired.

I hope your needs and wishes are catered for by that?

(I have a Plan B in case we need it during wider user testing, as
explained up thread.)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Simon Riggs si...@2ndquadrant.com wrote: 
 
 max_standby_delay is set in recovery.conf, value 0 (forever) -
2,000,000
 secs, settable in milliseconds. So think of it like a deadlock
detector
 for recovery apply.
 
Aha!  A deadlock is a type of serialization failure.  (In fact, on
databases with lock-based concurrency control rather than MVCC, it can
be the ONLY type of serialization failure.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 13:47 -0600, Kevin Grittner wrote:
  Simon Riggs si...@2ndquadrant.com wrote: 
  
  max_standby_delay is set in recovery.conf, value 0 (forever) -
 2,000,000
  secs, settable in milliseconds. So think of it like a deadlock
 detector
  for recovery apply.
  
 Aha!  A deadlock is a type of serialization failure.  (In fact, on
 databases with lock-based concurrency control rather than MVCC, it can
 be the ONLY type of serialization failure.)

The SQL Standard specifically names this error as thrown when it
detects the inability to guarantee the serializability of two or more
concurrent SQL-transactions. Now that really should only apply when
running with SERIALIZABLE transactions, but I grant you the standard
doesn't explicitly say that.

You give me the strange sense that you want this because of some quirk
in your software, rather than an overwhelming desire to see these two
situations described the same.

I guess making it that SQLSTATE would make it simpler to understand why
the error occurs and also how to handle it (i.e. resubmit). So there
probably is a wide argument for making developers jobs a little easier
by doing it. i.e. usability will be improved if we do that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Simon Riggs si...@2ndquadrant.com writes:

 Increasing the waiting time increases the failover time and thus
 decreases the value of the standby as an HA system. Others value high
 availability higher than you and so we had agreed to provide an option
 to allow the max waiting time to be set.

Sure, it's a nice option to have. But I think the default should be to pause
WAL replay.

The question I had was whether your solution for btree pointers marked dead
and later dropped from the index works when the user hasn't configured a
timeout and doesn't want standby queries killed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Kevin Grittner kevin.gritt...@wicourts.gov writes:

 Simon Riggs si...@2ndquadrant.com wrote: 
  
 max_standby_delay is set in recovery.conf, value 0 (forever) -
 2,000,000
 secs, settable in milliseconds. So think of it like a deadlock
 detector
 for recovery apply.
  
 Aha!  A deadlock is a type of serialization failure.  (In fact, on
 databases with lock-based concurrency control rather than MVCC, it can
 be the ONLY type of serialization failure.)

I think the fundamental difference is that a deadlock or serialization failure
can be predicted as a potential problem when writing the code. This is
something that can happen for any query any time, even plain old read-only
select queries.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Simon Riggs si...@2ndquadrant.com wrote: 
 
 The SQL Standard specifically names this error as thrown when it
 detects the inability to guarantee the serializability of two or
more
 concurrent SQL-transactions. Now that really should only apply when
 running with SERIALIZABLE transactions,
 
I disagree.  Database integrity could not be guaranteed without
detection of conflicting modification in READ COMMITTED on up, and
this is the normal means of indicating these problems.
 
 but I grant you the standard doesn't explicitly say that.
 
I think that's intentional.
 
 You give me the strange sense that you want this because of some
quirk
 in your software, rather than an overwhelming desire to see these
two
 situations described the same.
 
Well, we are very unlikely to ever use this feature, so it's not
really something I care about for us; it just struck me that there may
be others that care about categorizing errors accurately according the
the SQL standard, and that what you were describing sounded like a new
type of serialization failure in the PostgreSQL environment, and
should be classified that way.
 
The primary quirkiness of our software is that it needs to be able to
run with a number of different database products, and we do want to
take advantage of whatever information is available in a portable
format.  This is not the only standard SQLSTATE we look for and handle
appropriately for the documented meaning, but it is an important one,
as it has simplified application programming and reduced the confusing
error messages which reach our end users.
 
 I guess making it that SQLSTATE would make it simpler to understand
why
 the error occurs and also how to handle it (i.e. resubmit).
 
Precisely.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Gregory Stark
Kevin Grittner kevin.gritt...@wicourts.gov writes:

 PostgreSQL is much less prone to serialization failures, but it is
 certainly understandable if hot standby replication introduces new
 cases of it.

In this case it will be possible to get this error even if you're just running
a single SELECT query -- and that's the *only* query in the database at all.

A vacuum being replayed -- even in a different database -- could trigger the
error. Or with the btree split issue, a data load -- again even in a different
database -- would be quite likely cause your SELECT to be killed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Gregory Stark st...@enterprisedb.com wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 
 PostgreSQL is much less prone to serialization failures, but it is
 certainly understandable if hot standby replication introduces new
 cases of it.
 
 In this case it will be possible to get this error even if you're
just 
 running
 a single SELECT query -- and that's the *only* query in the database
at all.
 
 A vacuum being replayed -- even in a different database -- could
trigger the
 error. Or with the btree split issue, a data load -- again even in a
different
 database -- would be quite likely cause your SELECT to be killed.
 
OK.  Does that make serialization failure a bad description of the
problem?
 
If these steps are serialized (run one after the other), is there a
problem?  It just seems that the hot standby near-synchronous
replication creates a situation where tasks on multiple, linked
databases might need to be serialized.
 
It does seem like it will be important to provide as much information
to the user about what's causing the problem, and hints about what to
do.  PostgreSQL has nice features for that, though.
 
Since I have no vested interest here, I'm not inclined to belabor the
point.  I was really just trying to make sure the feature was as
useful as possible to others, some of whom might be looking for
standard SQLSTATE values to help the software take the right course.
If others feel the 40001 code would confuse more than enlighten, I'll
respect that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Kevin Grittner
 Gregory Stark st...@enterprisedb.com wrote: 
 
 I think the fundamental difference is that a deadlock or
serialization 
 failure
 can be predicted as a potential problem when writing the code. This
is
 something that can happen for any query any time, even plain old
read-only
 select queries.
 
I've heard that on Oracle it is (or at least was) possible to get a
serialization failure on a single SELECT statement which was the only
user-requested activity on the system, because it could conflict with
automatic maintenance operations.
 
In Sybase and Microsoft databases it is definitely possible for a
plain old read-only SELECT statement to be a deadlock victim (reported
as a serialization failure) if some of the data it is referencing is
being updated concurrently.  In these (and many other) products, a
lock must be acquired before a row can be read.  Imagine, the SELECT
locks a row against updates, another transaction locks some other row
against any access, then the UPDATE tries to change the row locked by
the SELECT while the SELECT tries to read the row locked by the
UPDATE.
 
PostgreSQL is much less prone to serialization failures, but it is
certainly understandable if hot standby replication introduces new
cases of it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Robert Treat
On Friday 19 December 2008 05:52:42 Simon Riggs wrote:
 BTW, I noticed the other day that Oracle 11g only allows you to have a
 read only slave *or* allows you to continue replaying. You need to
 manually switch back and forth between those modes. They can't do
 *both*, as Postgres will be able to do. That's because their undo
 information is stored off-block in the Undo Tablespace, so is not
 available for standby queries. Nice one, Postgres.


I think this is true for physical replay, but Oracle also offers the option to 
do logical replay (where transaction logs are converted into sql and run 
against the standby; i believe this is similar to what continuant is trying 
to do with thier latest offering). In that scenario you can do read and 
replay at the same time, though I think there are some conflicts possible; 
fewer than what postgres will have, since I think most of thier DDL can be 
done online. (This might require some extra modules / high end version of 
Oracle, please consult your local Oracle wizard for more details)

-- 
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 20:54 +, Gregory Stark wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 
  PostgreSQL is much less prone to serialization failures, but it is
  certainly understandable if hot standby replication introduces new
  cases of it.
 
 In this case it will be possible to get this error even if you're just running
 a single SELECT query -- and that's the *only* query in the database at all.
 
 A vacuum being replayed -- even in a different database -- could trigger the
 error. Or with the btree split issue, a data load -- again even in a different
 database -- would be quite likely cause your SELECT to be killed.

Quite likely? You're all doomed I say!, his eyes rolling wildly. :-)

The standby is an extension of the primary and is quite literally
running the same transactions. This only query idea isn't the right
way to think about it. It's fairly easily possible to predict it will
happen and it will happen only in same database as transactions on the
primary. And as we just said, you can control whether and/or after how
long this will happen in some detail.

Industry context: In the worst case this is as bad as Oracle 11g. In
many/most cases it is much better.

Perhaps we should listen to the people that have said they don't want
queries cancelled, even if the alternative is inconsistent answers. That
is easily possible yet is not currently an option. Plus we have the
option I referred to up thread, which is to defer query cancel until the
query reads a modified data block. I'm OK with implementing either of
those, as non-default options. Do we need those options or are we ok?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Simon Riggs

On Fri, 2008-12-19 at 19:29 -0500, Robert Treat wrote:
 On Friday 19 December 2008 05:52:42 Simon Riggs wrote:
  BTW, I noticed the other day that Oracle 11g only allows you to have a
  read only slave *or* allows you to continue replaying. You need to
  manually switch back and forth between those modes. They can't do
  *both*, as Postgres will be able to do. That's because their undo
  information is stored off-block in the Undo Tablespace, so is not
  available for standby queries. Nice one, Postgres.
 
 
 I think this is true for physical replay, but Oracle also offers the option 
 to 
 do logical replay (where transaction logs are converted into sql and run 
 against the standby; i believe this is similar to what continuant is trying 
 to do with thier latest offering). In that scenario you can do read and 
 replay at the same time, though I think there are some conflicts possible; 
 fewer than what postgres will have, since I think most of thier DDL can be 
 done online. 

That is also an option I have argued that we need, BTW.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas

Gregory Stark wrote:

Simon Riggs si...@2ndquadrant.com writes:


Increasing the waiting time increases the failover time and thus
decreases the value of the standby as an HA system. Others value high
availability higher than you and so we had agreed to provide an option
to allow the max waiting time to be set.


Sure, it's a nice option to have. But I think the default should be to pause
WAL replay.


I think I agree that pausing should be the default. If for no other 
reason, because I can't think of a good default for max_standby_delay.


It would be nice to have a setting to specify the max. amount of 
unapplied WAL before killing queries. When the primary isn't doing much, 
you might want wait longer before killing queries, and if you're falling 
behind a lot, you might want to kill queries more aggressively to catch 
up. I guess that doesn't quite fit the current architecture; you'd need 
to peek ahead to see how much unapplied WAL there is.



The question I had was whether your solution for btree pointers marked dead
and later dropped from the index works when the user hasn't configured a
timeout and doesn't want standby queries killed.


Yes, it's not any different from vacuum WAL records.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby and b-tree killed items

2008-12-19 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Gregory Stark wrote:
The question I had was whether your solution for btree pointers marked 
dead

and later dropped from the index works when the user hasn't configured a
timeout and doesn't want standby queries killed.


Yes, it's not any different from vacuum WAL records.


No wait, there is a nasty corner-case. When an index tuple is marked as 
killed, no WAL record is written. Since there's now WAL record, it won't 
be killed in the slave yet. But if we take a full-page image of that 
page later for some other operation, the LP_DEAD flag is included in the 
full-page image. If the flag sneaks into the slave without an explicit 
WAL record like that, there's no latestRemovedXid for the slave to wait on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Koichi Suzuki
Hi,

I found that no one is registered as hot standby reviewer.   I'd like
to review the patch, mainly by testing through some benchmark test.

Regards;

2008/11/2 Simon Riggs [EMAIL PROTECTED]:
 Hot Standby patch, including all major planned features.

 Allows users to connect to server in archive recovery and run queries.

 Applies cleanly, passes make check.

 There's no flaky code, kludges or other last minute rush-for-deadline
 stuff. It all works, though really for a patch this size and scope I
 expect many bugs. As a result I am describing this as WIP, though it
 is much more than a prototype. All the code has been planned out in
 advance, so there's been very little on-the-fly changes required during
 final development. I'm fully committed to making all required changes
 and fixes in reasonable times. I will continue detailed testing over the
 next few weeks to re-check everything prior to commit.

 Initially, I would ask people to spend time thinking about this
 conceptually to check that I have all the correct subsystems and cater
 for all the little side tweaks that exist in various parts of the server

 When you test this, please do it on a server built with --enable-cassert
 and keep a detailed log using trace_recovery_messages = DEBUG4 (or 2-3).

 Code has been specifically designed to be performance neutral or better
 for normal workloads, so the WAL volume, number of times we take WAL
 locks etc should be identical (on 64-bit systems). The patch includes
 further tuning of subtransaction commits, so I hope the patch may even
 be a few % win on more complex workloads with many PL/pgSQL functions
 using EXCEPTION. Enabling the bgwriter during recovery seems likely to
 be a huge gain on performance for larger shared_buffers settings, which
 should offset a considerable increase in CPU usage during recovery.

 Performance test results would be appreciated
 * for normal running - just to test that it really is neutral
 * for recovery - if query access not used is it faster than before

 Points of significant note for detailed reviewers

 * Prepared transactions not implemented yet. No problems foreseen, but
 want to wait to see if other refactorings are required.

 * Touching every block of a btree index during a replay of VACUUM is not
 yet implemented. Would like some second opinions that it is even
 required. I have code prototyped for it, but it feels too wacky even
 after Heikki and I agreed it was required. Do we still think that?

 * locking correctness around flat file refresh still not enabled

 * need some discussiona round how to handle max_connections changes
 cleanly.

-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Pavan Deolasee
On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote:

 Hi,

 I found that no one is registered as hot standby reviewer.   I'd like
 to review the patch, mainly by testing through some benchmark test.


You can yourself edit the Wiki page, though you need to register first. But
its very straight forward.

I added myself as reviewer to Hot standby few days back since I did some
code review and testing. I intend to spend some more time after new patch is
posted.

Thanks,
Pavan

Pavan Deolasee.
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-12-02 Thread Koichi Suzuki
Thank you for your advise.   I'll edit the Wiki page.

2008/12/3 Pavan Deolasee [EMAIL PROTECTED]:


 On Wed, Dec 3, 2008 at 11:00 AM, Koichi Suzuki [EMAIL PROTECTED] wrote:

 Hi,

 I found that no one is registered as hot standby reviewer.   I'd like
 to review the patch, mainly by testing through some benchmark test.


 You can yourself edit the Wiki page, though you need to register first. But
 its very straight forward.

 I added myself as reviewer to Hot standby few days back since I did some
 code review and testing. I intend to spend some more time after new patch is
 posted.

 Thanks,
 Pavan

 Pavan Deolasee.
 EnterpriseDB http://www.enterprisedb.com




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-21 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 8:15 PM, Pavan Deolasee [EMAIL PROTECTED]wrote:



 On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED]wrote:




 (I assume you mean bgwriter, not archiver process).


 Yeah, its the bgwriter, IIRC hung while taking checkpoint.



Sorry, its the startup process thats stuck in the checkpoint. Here is the
stack trace:

(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x0095564b in semop () from /lib/libc.so.6
#2  0x0825c703 in PGSemaphoreLock (sema=0xb7c52c7c, interruptOK=0 '\0') at
pg_sema.c:420
#3  0x0829ff5e in LWLockAcquire (lockid=WALInsertLock, mode=LW_EXCLUSIVE) at
lwlock.c:456
#4  0x080d5c7e in XLogInsert (rmid=0 '\0', info=16 '\020', rdata=0xbfda1798)
at xlog.c:746
#5  0x080e2e0f in CreateCheckPoint (flags=6) at xlog.c:6674
#6  0x080e1afd in StartupXLOG () at xlog.c:6077
#7  0x08104f2f in AuxiliaryProcessMain (argc=2, argv=0xbfda19e4) at
bootstrap.c:421
#8  0x0826d285 in StartChildProcess (type=StartupProcess) at
postmaster.c:4104
#9  0x082690d9 in PostmasterMain (argc=3, argv=0x9c89a60) at
postmaster.c:1034
#10 0x081f90ff in main (argc=3, argv=0x9c89a60) at main.c:188


ISTM that the postmaster somehow does not receive (may be because of what I
reported in my other mail) PMSIGNAL_RECOVERY_START and hence bgwriter is not
started. The startup process then itself tries to take a checkpoint and
hangs.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-21 Thread Simon Riggs

On Fri, 2008-11-21 at 17:08 +0530, Pavan Deolasee wrote:

 Sorry, its the startup process thats stuck in the checkpoint. Here is
 the stack trace:

Already fixed in new version I'm preparing for you.

Both the startup process and bgwriter can perform restartpoints, so its
not a problem whether we reached the point at which we pmsignal or not.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 11:51 +0530, Pavan Deolasee wrote:

 I wonder if we should refactor lazy_scan_heap() so that *all* the real
 work of collecting information about dead tuples happens only in
 heap_page_prune(). Frankly, there is only a rare chance that a tuple
 may become DEAD after the pruning happened on the page. We can ignore
 such tuples; they will be vacuumed/pruned in the next cycle.
 
 This would save us a second check of HeapTupleSatisfiesVacuum on the
 tuples which are just now checked in heap_page_prune(). In addition,
 the following additional WAL records are then not necessary because
 heap_page_prune() must have already logged the latestRemovedXid.

I like this idea. I've attempted to plug every gap, but perhaps the best
way here is to remove the gap completely.

In my testing, I only saw this case happen a couple of times in many
tests. Rarely executed code gives sporadic bugs, so I would be happy to
remove it and the standby support stuff that goes with it.

I would suggest that we just remove the switch statement:
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
and alter the following if test since tupgone is also removed.
That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple().
Comments on that function claim that is a bad thing, but we know that
any row that was *not* removed by heap_page_prune() and is now dead must
have died very recently and so will never be frozen.

Let me know if you're happy with that change and I'll make it so.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 3:38 PM, Simon Riggs [EMAIL PROTECTED] wrote:


 I would suggest that we just remove the switch statement:
switch (HeapTupleSatisfiesVacuum(tuple.t_data, OldestXmin, buf))
 and alter the following if test since tupgone is also removed.
 That will cause HEAPTUPLE_DEAD rows to be fed to heap_freeze_tuple().
 Comments on that function claim that is a bad thing, but we know that
 any row that was *not* removed by heap_page_prune() and is now dead must
 have died very recently and so will never be frozen.

 Let me know if you're happy with that change and I'll make it so.



Yeah, I think we should be safe. We continuously hold EX lock on the buffer
since the prune operation is carried out. So the only new DEAD tuples may
arrive because some transaction aborted in between, changing
INSERT_IN_PROGRESS tuple to DEAD. But these tuples won't pass the cutoff_xid
test and should never be frozen.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 12:03 +0530, Pavan Deolasee wrote:

 On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED]
 wrote:
 Hot Standby patch, including all major planned features.
 
 
 While experimenting with the patch, I noticed that sometimes the
 archiver process indefinitely waits for WALInsertLock. I haven't spent
 much time debugging that, but the following chunk clearly seems to be
 buggy. The WALInsertLock is not released if (leavingArchiveRecovery ==
 true).

Mmmm, it seems this is correct. I had to reconstruct this section of
code after recent bitrot, so it looks I introduced a bug doing that.
What I'm surprised about is that I got a similar hang myself in testing
and in my notes I have this ticked as resolved.

The fix is trivial, though I'm sorry it was there for you to find at
all.

(I assume you mean bgwriter, not archiver process).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 7:50 PM, Simon Riggs [EMAIL PROTECTED] wrote:




 (I assume you mean bgwriter, not archiver process).


Yeah, its the bgwriter, IIRC hung while taking checkpoint.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 I wonder if we should refactor lazy_scan_heap() so that *all* the real work
 of collecting information about dead tuples happens only in
 heap_page_prune(). Frankly, there is only a rare chance that a tuple may
 become DEAD after the pruning happened on the page. We can ignore such
 tuples; they will be vacuumed/pruned in the next cycle.

 This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
 which are just now checked in heap_page_prune(). In addition, the following
 additional WAL records are then not necessary because heap_page_prune() must
 have already logged the latestRemovedXid.

I don't think you can do that.  Couldn't someone else have run
heap_page_prune between vacuum's first and second visit to the page?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Simon Riggs

On Thu, 2008-11-20 at 10:33 -0500, Tom Lane wrote:
 Pavan Deolasee [EMAIL PROTECTED] writes:
  I wonder if we should refactor lazy_scan_heap() so that *all* the real work
  of collecting information about dead tuples happens only in
  heap_page_prune(). Frankly, there is only a rare chance that a tuple may
  become DEAD after the pruning happened on the page. We can ignore such
  tuples; they will be vacuumed/pruned in the next cycle.
 
  This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
  which are just now checked in heap_page_prune(). In addition, the following
  additional WAL records are then not necessary because heap_page_prune() must
  have already logged the latestRemovedXid.
 
 I don't think you can do that.  Couldn't someone else have run
 heap_page_prune between vacuum's first and second visit to the page?

I just looked at that in more detail and decided it was more difficult
than it first appeared. So I've left it for now.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-20 Thread Pavan Deolasee
On Thu, Nov 20, 2008 at 9:03 PM, Tom Lane [EMAIL PROTECTED] wrote:


 I don't think you can do that.  Couldn't someone else have run
 heap_page_prune between vacuum's first and second visit to the page?




You mean the second visit in the first pass where we again check for
HeapTupleSatisfiesVacuum ? We hold exclusive lock continuously in the first
pass. So its not possible for someone else to call heap_page_prune.  If its
the second visit in the second heap scan, then it removes only the dead
tuples recorded in the first pass. So we should be good there too.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-19 Thread Pavan Deolasee
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 Hot Standby patch, including all major planned features.



I wonder if we should refactor lazy_scan_heap() so that *all* the real work
of collecting information about dead tuples happens only in
heap_page_prune(). Frankly, there is only a rare chance that a tuple may
become DEAD after the pruning happened on the page. We can ignore such
tuples; they will be vacuumed/pruned in the next cycle.

This would save us a second check of HeapTupleSatisfiesVacuum on the tuples
which are just now checked in heap_page_prune(). In addition, the following
additional WAL records are then not necessary because heap_page_prune() must
have already logged the latestRemovedXid.

+ /*
+  * For Hot Standby we need to know the highest transaction id that will
+  * be removed by any change. VACUUM proceeds in a number of passes so
+  * we need to consider how each pass operates. The first pass runs
+  * heap_page_prune(), which can issue XLOG_HEAP2_CLEAN records as it
+  * progresses - these will have a latestRemovedXid on each record.
+  * In many cases this removes all of the tuples to be removed.
+  * Then we look at tuples to be removed, but do not actually remove them
+  * until phase three. However, index records for those rows are removed
+  * in phase two and index blocks do not have MVCC information attached.
+  * So before we can allow removal of *any* index tuples we need to issue
+  * a WAL record indicating what the latestRemovedXid will be at the end
+  * of phase three. This then allows Hot Standby queries to block at the
+  * correct place, i.e. before phase two, rather than during phase three
+  * as we issue more XLOG_HEAP2_CLEAN records. If we need to run multiple
+  * phase two/three because of memory constraints we need to issue multiple
+  * log records also.
+  */
+ static void
+ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
+ {
+   /*
+* No need to log changes for temp tables, they do not contain
+* data visible on the standby server.
+*/
+   if (rel-rd_istemp)
+   return;
+
+   (void) log_heap_cleanup_info(rel-rd_node,
vacrelstats-latestRemovedXid);
+ }


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot Standby (commit fest version - v5)

2008-11-19 Thread Pavan Deolasee
On Sat, Nov 1, 2008 at 10:02 PM, Simon Riggs [EMAIL PROTECTED] wrote:

 Hot Standby patch, including all major planned features.


While experimenting with the patch, I noticed that sometimes the archiver
process indefinitely waits for WALInsertLock. I haven't spent much time
debugging that, but the following chunk clearly seems to be buggy. The
WALInsertLock is not released if (leavingArchiveRecovery == true).

--- 6565,6592 
}
}

!   if (leavingArchiveRecovery)
!   checkPoint.redo = GetRedoLocationForArchiveCheckpoint();
!   else
{
!   /*
!* Compute new REDO record ptr = location of next XLOG record.
!*
!* NB: this is NOT necessarily where the checkpoint record itself
will be,
!* since other backends may insert more XLOG records while we're off
doing
!* the buffer flush work.  Those XLOG records are logically after
the
!* checkpoint, even though physically before it.  Got that?
!*/
!   checkPoint.redo = GetRedoLocationForCheckpoint();

!   /*
!* Now we can release WAL insert lock, allowing other xacts to
proceed
!* while we are flushing disk buffers.
!*/
!   LWLockRelease(WALInsertLock);
}

/*
 * If enabled, log checkpoint start.  We postpone this until now so as
not
 * to log anything if we decided to skip the checkpoint.
 */


Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Hot standby v5 patch assertion failure

2008-11-09 Thread Mark Kirkwood

Simon Riggs wrote:

On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote:
  

Trying out a few different scenarios I ran across this:

1/ Setup master and replica with replica using pg_standby
2/ Create a new database (bench in my case)
3/ Initialize pgbench schema size 100
4/ Run with 2 clients and 1 transactions
5/ Replica gets assertion failure



I've been unable to reproduce this error in more than 2 days of bashing.
The bash test I use is a pgbench variant designed to cause write
contention, while at the same time running reads against those same
blocks on standby, plus running parallel installcheck.

I suspect now there was a problem in ProcArrayClearUnobservedXids(), so
I clear the array each time now, whether or not we are in assert mode.
i.e. better hygiene around reused data structures. So I *haven't*
reworked my earlier code, just checked it all again.

So, new patch enclosed. This fixes everything reported so far, plus
another 2 bugs I found and fixed during re-test.

  

Patching with v5d, I can no longer reproduce this either. Excellent!

Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby v5 patch assertion failure

2008-11-03 Thread Simon Riggs

On Mon, 2008-11-03 at 06:41 +, Simon Riggs wrote:
 On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote:
 
  Trying out a few different scenarios I ran across this:
  
  CONTEXT:  xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59
  DEBUG:  start recovery xid = 7002 lsn = 0/6F012EE4
  CONTEXT:  xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59
  TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 
  0))), File: procarray.c, Line: 2037)
 
 OK, thanks Mark. I'll start looking at it now.

It's nice to know the exact line something fails on. I'd instrumented
the whole of the UnobservedXids code to trap failures. I've had a couple
of errors in that already during development. But what to do about it?

I'm thinking the best way to handle this is just to simplify this part
of the code some, rather than continue tweaking it. The code attempts to
optimise the removal of UnobservedXids, but that feels now like a
premature optimisation. So I can probably drop ~100 lines of code.

I'm now adding the btree logic also, as well as updating the patch to
current head. So I'll return with an updated patch as soon as all that's
done and I've run a complete re-test.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot standby v5 patch assertion failure

2008-11-02 Thread Simon Riggs

On Mon, 2008-11-03 at 12:16 +1300, Mark Kirkwood wrote:

 Trying out a few different scenarios I ran across this:
 
 CONTEXT:  xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59
 DEBUG:  start recovery xid = 7002 lsn = 0/6F012EE4
 CONTEXT:  xlog redo update: rel 1663/16384/16397; tid 9614/62; new 158828/59
 TRAP: FailedAssertion(!(!((UnobservedXids[index]) != ((TransactionId) 
 0))), File: procarray.c, Line: 2037)

OK, thanks Mark. I'll start looking at it now.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: Caches and Locks

2008-10-31 Thread Simon Riggs

On Thu, 2008-10-30 at 10:13 +, Simon Riggs wrote:
 On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote:
 
  We can't augment the commit/abort messages because
  we must cater for non-transactional invalidations also, plus commit
  xlrecs are already complex enough. So we log invalidations prior to
  commit, queue them and then trigger the send at commit (if it
  happens).
 
 Augmenting the commit messages seems like the better approach. It allows
 invalidation messages to be fired as they are read off the xlrec. Still
 need the additional message type to handle nontransactional
 invalidation. There are other messages possibly more complex than this
 already.

Just a quick note to say that this approach has worked fine and I now
have both cache invalidation and locking working correctly.

Rather than submit something now in an unseemly rush I'll tidy up and
add it onto the list tomorrow after some tidy up.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: Caches and Locks

2008-10-30 Thread Simon Riggs

On Tue, 2008-10-21 at 15:06 +0100, Simon Riggs wrote:

 We can't augment the commit/abort messages because
 we must cater for non-transactional invalidations also, plus commit
 xlrecs are already complex enough. So we log invalidations prior to
 commit, queue them and then trigger the send at commit (if it
 happens).

Augmenting the commit messages seems like the better approach. It allows
invalidation messages to be fired as they are read off the xlrec. Still
need the additional message type to handle nontransactional
invalidation. There are other messages possibly more complex than this
already.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: Caches and Locks

2008-10-30 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 We can't augment the commit/abort messages because
 we must cater for non-transactional invalidations also, plus commit
 xlrecs are already complex enough. So we log invalidations prior to
 commit, queue them and then trigger the send at commit (if it
 happens).

 Augmenting the commit messages seems like the better approach. It allows
 invalidation messages to be fired as they are read off the xlrec. Still
 need the additional message type to handle nontransactional
 invalidation. There are other messages possibly more complex than this
 already.

I guess I hadn't been paying attention, but: adding syscache inval
traffic to WAL seems like a completely horrid idea, both from the
complexity and performance standpoints.  What about using the existing
syscache logic to re-derive inval information from watching the update
operations?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: Caches and Locks

2008-10-30 Thread Simon Riggs

On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  We can't augment the commit/abort messages because
  we must cater for non-transactional invalidations also, plus commit
  xlrecs are already complex enough. So we log invalidations prior to
  commit, queue them and then trigger the send at commit (if it
  happens).
 
  Augmenting the commit messages seems like the better approach. It allows
  invalidation messages to be fired as they are read off the xlrec. Still
  need the additional message type to handle nontransactional
  invalidation. There are other messages possibly more complex than this
  already.
 
 I guess I hadn't been paying attention, but: adding syscache inval
 traffic to WAL seems like a completely horrid idea, both from the
 complexity and performance standpoints.  

Well, it's coming out fairly simple actually. Can you explain where you
think the performance loss is? My expectation is less than a 0.1% WAL
volume overhead for typical systems. My comment this morning was to say
I've managed to augment the commit record, so we're not even sending
many additional messages.

It also makes much of the Hot Standby patch fairly simple, even if it is
large. Write something to WAL, act on it on the other side. I've paid
very close attention to minimising the effects on both the number of
lock acquisitions and total WAL volume, but having said that I expect
there to be many tuning opportunities.

 What about using the existing
 syscache logic to re-derive inval information from watching the update
 operations?

That does sound possible, but it makes some big assumptions about
transactional machinery being in place. It ain't. Subtransactions make
everything about 5 times more difficult, so it seems pretty scary to me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: Caches and Locks

2008-10-30 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2008-10-30 at 08:30 -0400, Tom Lane wrote:
 What about using the existing
 syscache logic to re-derive inval information from watching the update
 operations?

 That does sound possible, but it makes some big assumptions about
 transactional machinery being in place. It ain't. Subtransactions make
 everything about 5 times more difficult, so it seems pretty scary to me.

Um.  Yeah, subtransactions would be a PITA.  Never mind that then ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-27 Thread Robert Treat
On Monday 20 October 2008 05:25:29 Simon Riggs wrote:
 I'm looking to implement the following functions for Hot Standby, to
 allow those with administrative tools or management applications to have
 more control during recovery. Please let me know if other functions are
 required.

 What else do we need?


Is it possible to give the master/slave knowledge about each other?

-- 
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-27 Thread Simon Riggs

On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote:

 On Monday 20 October 2008 05:25:29 Simon Riggs wrote:
  I'm looking to implement the following functions for Hot Standby, to
  allow those with administrative tools or management applications to have
  more control during recovery. Please let me know if other functions are
  required.
 
  What else do we need?
 
 
 Is it possible to give the master/slave knowledge about each other?

Yes, but for what reason?

The project I'm working on is Hot Standby, not streaming replication.
That will link things together better than they are now, so I'd probably
rather not prejudge/duplicate that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-27 Thread Emmanuel Cecchet

Simon Riggs wrote:

On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote:

  

On Monday 20 October 2008 05:25:29 Simon Riggs wrote:


I'm looking to implement the following functions for Hot Standby, to
allow those with administrative tools or management applications to have
more control during recovery. Please let me know if other functions are
required.

What else do we need?

  

Is it possible to give the master/slave knowledge about each other?



Yes, but for what reason?

The project I'm working on is Hot Standby, not streaming replication.
That will link things together better than they are now, so I'd probably
rather not prejudge/duplicate that.
  
I think this could make sense in the case of a network partition (split 
brain).
If we need to perform a reconciliation after a partition we will need to 
know when a slave has switched to master and what transactions have been 
processed on each side.
Another use case is probably to implement failback once the former 
master has been restarted to just send the diff since failover happened.
Actually to complement pg_last_recovered_xact_xid(), 
pg_last_completed_xact_xid() or pg_last_recovered_xact_timestamp(), I 
would like to have something like pg_xact_xid_status(txid) that would 
return something of a xid_status type that can be completed, recovered 
or not_found with a timestamp that would only be meaningful if the 
status is recovered or completed.


Regards,
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: [EMAIL PROTECTED]
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-27 Thread Simon Riggs

On Mon, 2008-10-27 at 13:08 -0400, Robert Treat wrote:

 Was thinking that admin tools that show hot standby information might
 also want to show the corresponding slave information (from the point
 of view of the master). 

Well, the standby might be persuaded to know something about the master,
but not the other way around. The master:standby relationship is 1:Many,
and not restricted in the way things are daisy-chained. So it's more
than just a pair of servers.

All the slaves will say they are the same if you ask them, so you can't
rely on that to identify them. So you need to specify which two servers
you're interested in comparing and how to identify them externally.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-27 Thread Robert Treat
On Monday 27 October 2008 12:12:18 Simon Riggs wrote:
 On Mon, 2008-10-27 at 11:42 -0400, Robert Treat wrote:
  On Monday 20 October 2008 05:25:29 Simon Riggs wrote:
   I'm looking to implement the following functions for Hot Standby, to
   allow those with administrative tools or management applications to
   have more control during recovery. Please let me know if other
   functions are required.
  
   What else do we need?
 
  Is it possible to give the master/slave knowledge about each other?

 Yes, but for what reason?


Was thinking that admin tools that show hot standby information might also 
want to show the corresponding slave information (from the point of view of 
the master).  It might also allow tools to not have to be configured for all 
servers... ie connect to one and lookup the other.  

 The project I'm working on is Hot Standby, not streaming replication.
 That will link things together better than they are now, so I'd probably
 rather not prejudge/duplicate that.

It's possible this type of information isn't appropriate for our Hot Standby 
implementation, but it is somewhat common in asynchronous and/or master/slave 
systems. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-23 Thread Simon Riggs

On Mon, 2008-10-20 at 10:25 +0100, Simon Riggs wrote:

 What else do we need?

 * pg_freeze_recovery() 
 * pg_unfreeze_recovery() 

Two more functions

pg_freeze_recovery_cleanup()
pg_unfreeze_recovery_cleanup()

These would allow recovery to continue normally, except for row removal
operations which would halt the progress of recovery.

It would eventually be possible to have a function that halts recovery
whenever row removal takes place for a list of tables. Not planning on
implementing that initially though.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-21 Thread Zeugswetter Andreas OSB sIT

  * pg_last_recovered_xact_xid()
  Will throw an ERROR if *not* executed in recovery mode.
  returns bigint
 
  * pg_last_completed_xact_xid()
  Will throw an ERROR *if* executed in recovery mode.
  returns bigint

 Should these return xid?

And shouldn't these two be folded together ?
It seems most usages of this xid(/lsn?) will be agnostic to the
recovery mode. Or if not, it seems more convenient to have a function
that returns both recovery mode and xid, no ?

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-21 Thread Simon Riggs

On Tue, 2008-10-21 at 09:44 +0200, Zeugswetter Andreas OSB sIT wrote:
   * pg_last_recovered_xact_xid()
   Will throw an ERROR if *not* executed in recovery mode.
   returns bigint
  
   * pg_last_completed_xact_xid()
   Will throw an ERROR *if* executed in recovery mode.
   returns bigint
 
  Should these return xid?
 
 And shouldn't these two be folded together ?
 It seems most usages of this xid(/lsn?) will be agnostic to the
 recovery mode. Or if not, it seems more convenient to have a function
 that returns both recovery mode and xid, no ?

You are right that it would be better to have a single function.
Functions that return multiple values are a pain to use and develop,
plus we can always run the other function if we are in doubt.

txid_last_completed() returns bigint (txid) seems better.

I am more than happy to add an id version as well, if anybody sees the
need for that. Just say.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-20 Thread Robert Haas
 * pg_last_recovered_xact_xid()
 Will throw an ERROR if *not* executed in recovery mode.
 returns bigint

 * pg_last_completed_xact_xid()
 Will throw an ERROR *if* executed in recovery mode.
 returns bigint

Should these return xid?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby utility and administrator functions

2008-10-20 Thread Simon Riggs

On Mon, 2008-10-20 at 16:22 -0400, Robert Haas wrote:
  * pg_last_recovered_xact_xid()
  Will throw an ERROR if *not* executed in recovery mode.
  returns bigint
 
  * pg_last_completed_xact_xid()
  Will throw an ERROR *if* executed in recovery mode.
  returns bigint
 
 Should these return xid?

Perhaps, but they match txid_current() which returns bigint.
http://developer.postgresql.org/pgdocs/postgres/functions-info.html

Thanks for checking.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: First integrated patch

2008-10-18 Thread Simon Riggs

On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote:
 On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  First integrated patch for Hot Standby, allowing queries to be executed
  while in recovery mode.
 
  The patch tests successfully with the enclosed files:
  * primary_setup_test.sql - run it on primary node
  * standby_allowed.sql - run on standby - should all succeed
  * standby_disallowed.sql - run on standby - should all fail
  plus other manual testing.
 
  This is still WIP - its good enough to release for comments, though I am
  not yet confident enough to claim it bug free.
 
  What this doesn't do YET:
  * cope fully with subxid cache overflows (some parts still to add)
  * cope with prepared transactions on master
  * work correctly when running queries AND replaying WAL
  * work correctly with regard to AccessExclusiveLocks, which should
  prevent access to tables
 
  These last four points are what I'm working on over the next two weeks,
  plus any other holes people point out along the way. I have worked out
  designs for most of these aspects and will discuss them on -hackers,
  though most design notes are in the Wiki. I'm still looking into
  prepared transactions.
 
  Comments appreciated.
 
 It appears to be working, at least in some fashion.  The supplied
 tests all pass.

Cool

Thanks for testing so far.

 At first glance it seems like I have to force changes to the standby
 with pg_switch_xlog().
 
 hmm.

You'll have to explain some more. Normally files don't get sent until
they are full, so yes, you would need to do a pg_switch_xlog().

This is not streaming replication. Others are working on that.

 This probably isn't right:
 postgres=# \d
 
 No relations found.
 postgres=# select count(*) from foo;
   count
 -
  100
 (1 row)
 
 I created a table, pg_switch_xlog, query several times,i dropped a
 table,  pg_switch_xlog, table is 'gone', but still returns data
 
 exit/enter session, now its gone.  Sometimes I have to exit/enter
 session to get an up to date standby.  These are just first
 impressions...

Replaying and queries don't mix yet, so that is expected. I'm working on
this in phases. This patch is phase 1 - it is not the finished patch.
Phase 2: working on correct block locking to allow concurrent DML
changes to occcur while we run queries.
Phase 3: working on correct relation locking/relcache to allow
concurrent DDL changes to occur while we run queries.
I have designs of the above and expect to complete in next two weeks.

The reason for the above behaviour is that DDL changes need to fire
relcache invalidation messages so that the query backend sees the
change. The reason the table is still there is because the files haven't
been dropped yet. So everything you have seen is expected, by me.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: First integrated patch

2008-10-18 Thread Merlin Moncure
On Sat, Oct 18, 2008 at 4:11 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 On Fri, 2008-10-17 at 16:47 -0400, Merlin Moncure wrote:
 On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  First integrated patch for Hot Standby, allowing queries to be executed
  while in recovery mode.
 
  The patch tests successfully with the enclosed files:
  * primary_setup_test.sql - run it on primary node
  * standby_allowed.sql - run on standby - should all succeed
  * standby_disallowed.sql - run on standby - should all fail
  plus other manual testing.
 
  This is still WIP - its good enough to release for comments, though I am
  not yet confident enough to claim it bug free.
 
  What this doesn't do YET:
  * cope fully with subxid cache overflows (some parts still to add)
  * cope with prepared transactions on master
  * work correctly when running queries AND replaying WAL
  * work correctly with regard to AccessExclusiveLocks, which should
  prevent access to tables
 
  These last four points are what I'm working on over the next two weeks,
  plus any other holes people point out along the way. I have worked out
  designs for most of these aspects and will discuss them on -hackers,
  though most design notes are in the Wiki. I'm still looking into
  prepared transactions.
 
  Comments appreciated.

 It appears to be working, at least in some fashion.  The supplied
 tests all pass.

 Cool

 Thanks for testing so far.

 At first glance it seems like I have to force changes to the standby
 with pg_switch_xlog().

 hmm.

 You'll have to explain some more. Normally files don't get sent until
 they are full, so yes, you would need to do a pg_switch_xlog().

 This is not streaming replication. Others are working on that.

right...this was expected.  less the missing parts, things are working
very well.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: First integrated patch

2008-10-17 Thread Merlin Moncure
On Fri, Oct 17, 2008 at 10:38 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 First integrated patch for Hot Standby, allowing queries to be executed
 while in recovery mode.

 The patch tests successfully with the enclosed files:
 * primary_setup_test.sql - run it on primary node
 * standby_allowed.sql - run on standby - should all succeed
 * standby_disallowed.sql - run on standby - should all fail
 plus other manual testing.

 This is still WIP - its good enough to release for comments, though I am
 not yet confident enough to claim it bug free.

 What this doesn't do YET:
 * cope fully with subxid cache overflows (some parts still to add)
 * cope with prepared transactions on master
 * work correctly when running queries AND replaying WAL
 * work correctly with regard to AccessExclusiveLocks, which should
 prevent access to tables

 These last four points are what I'm working on over the next two weeks,
 plus any other holes people point out along the way. I have worked out
 designs for most of these aspects and will discuss them on -hackers,
 though most design notes are in the Wiki. I'm still looking into
 prepared transactions.

 Comments appreciated.

It appears to be working, at least in some fashion.  The supplied
tests all pass.

At first glance it seems like I have to force changes to the standby
with pg_switch_xlog().

hmm.

This probably isn't right:
postgres=# \d

No relations found.
postgres=# select count(*) from foo;
  count
-
 100
(1 row)

I created a table, pg_switch_xlog, query several times,i dropped a
table,  pg_switch_xlog, table is 'gone', but still returns data

exit/enter session, now its gone.  Sometimes I have to exit/enter
session to get an up to date standby.  These are just first
impressions...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-24 Thread Simon Riggs

On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote:

 here are some scattered thoughts i had while reading it :

Thanks for your comments.

It is very detailed, so further feedback is going to be very beneficial
in making this all work in the way we hope and expect.

 * However, some WAL redo actions will be for DDL actions. These DDL actions 
 are repeating actions that have already committed on the primary node, so 
 they must not fail on the standby node. These DDL locks take priority and 
 will automatically cancel any read-only transactions that get in their way.
 
 Some people will want the option to stack-up ddl transactions behind 
 long-running queries (one of the main use cases of a hot slave is reporting 
 stye and other long running queries)

Scheduling tools can help here. Run set of queries on Standby, then when
complete run DDL on Primary and have its changes filter through.

 * Actions not allowed on Standby are:
 DML - Insert, Update, Delete, COPY FROM, Truncate
 
 copy from suprised me a bit, since it is something i could see people wanting 
 to do... did you mean COPY TO in this case?

I checked...

COPY TO is allowed, since it extracts data. So pg_dump will run. 
COPY FROM will not be allowed since it loads data. So pg_restore will
not run.

 * Statspack functions should work OK, so tools such as pgAdminIII should 
 work. 
 pgAgent will not.
 
 I presume this means the backend kill function would work?  

Yes it will.

 Also, can you go 
 into why pgAgent would not work? (I presume it's because you can't update 
 information that needs to be changed when jobs run, if thats the case it 
 might be worth thinking about making pgAgent work across different clusters)

Yes

 * Looking for suggestions about what monitoring capability will be required.
 
 one place to start might be to think about which checks in check_nagios might 
 still apply.  Possibly also looking to systems like slony for some 
 guidence... for example 

check_pgsql will work, but its very simple.

check_postgres will also work, though many some actions could give
different or confusing results. e.g. last vacuum time will not be
maintained for example, since no vacuum occurs on the standby.

slony won't run on the standby either, so those checks won't work
either. 

 everyone will want some way to check how far the lag 
 is on a stnadby machine. 

Agreed

 * The following commands will not be accepted during recovery mode 
  GRANT, REVOKE, REASSIGN 
 
 How is user management done on a standby? can you have users that dont
 exist 
 on the primary (it would seem not). 

No you can't have different users. (In time, I see this as a good thing
because it will allow us to move queries around to different nodes for
execution so that a single database acts like a hive mind.)

Nearly everything must be set via the Primary. Users, passwords.
Changes to .conf files will be possible. There is no mechanism to auto
synchronise the .conf files between nodes.

The Standby is a Clone and not a Slave. A Slave is a separate database
that is forced to duplicate the actions of the Master. The Standby is an
exact copy, in every detail that matters.

I can see it might be desirable to have it work that way, but that's not
going to happen in the first release.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-24 Thread Dave Page
On Wed, Sep 24, 2008 at 5:30 AM, Robert Treat
[EMAIL PROTECTED] wrote:

 I presume this means the backend kill function would work?  Also, can you go
 into why pgAgent would not work? (I presume it's because you can't update
 information that needs to be changed when jobs run, if thats the case it
 might be worth thinking about making pgAgent work across different clusters)

Probably - it needs to be able to update its catalogs to record job
results and ensure that only a single node runs any given job
instance.

We could probably update the code to optionally accept a connection
string instead of a database name, which would allow us to operate
against servers other than the one with the catalogs installed.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-24 Thread Simon Riggs

On Wed, 2008-09-24 at 08:28 +0100, Dave Page wrote:
 On Wed, Sep 24, 2008 at 5:30 AM, Robert Treat
 [EMAIL PROTECTED] wrote:
 
  I presume this means the backend kill function would work?  Also, can you go
  into why pgAgent would not work? (I presume it's because you can't update
  information that needs to be changed when jobs run, if thats the case it
  might be worth thinking about making pgAgent work across different clusters)
 
 Probably - it needs to be able to update its catalogs to record job
 results and ensure that only a single node runs any given job
 instance.
 
 We could probably update the code to optionally accept a connection
 string instead of a database name, which would allow us to operate
 against servers other than the one with the catalogs installed.

It would be useful to be able to schedule jobs against multiple nodes.

Not much need for maintenance actions on a Standby node, but I think if
I say there aren't any needed at all that would likely be wrong.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-24 Thread Robert Treat
On Wednesday 24 September 2008 03:27:44 Simon Riggs wrote:
 On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote:
  here are some scattered thoughts i had while reading it :

 Thanks for your comments.

 It is very detailed, so further feedback is going to be very beneficial
 in making this all work in the way we hope and expect.

  * However, some WAL redo actions will be for DDL actions. These DDL
  actions are repeating actions that have already committed on the primary
  node, so they must not fail on the standby node. These DDL locks take
  priority and will automatically cancel any read-only transactions that
  get in their way.
 
  Some people will want the option to stack-up ddl transactions behind
  long-running queries (one of the main use cases of a hot slave is
  reporting stye and other long running queries)

 Scheduling tools can help here. Run set of queries on Standby, then when
 complete run DDL on Primary and have its changes filter through.


true... i am just reminded of Oracle 8's log replay, where you had to stop 
replay to run any queries... given that was a usefull feature, I suspect 
we'll hear complaints about it not going that way. I think one could argue 
that we might be able to provide such an option in the future, if not in the 
first release. 

  * Actions not allowed on Standby are:
  DML - Insert, Update, Delete, COPY FROM, Truncate
 
  copy from suprised me a bit, since it is something i could see people
  wanting to do... did you mean COPY TO in this case?

 I checked...

 COPY TO is allowed, since it extracts data. So pg_dump will run.
 COPY FROM will not be allowed since it loads data. So pg_restore will
 not run.


ah, of course... the late hour had me thinking backwards.  pg_dump on the 
clone will be a big plus. 

snip
  * Looking for suggestions about what monitoring capability will be
  required.
 
  one place to start might be to think about which checks in check_nagios
  might still apply.  Possibly also looking to systems like slony for some
  guidence... for example

 check_pgsql will work, but its very simple.

 check_postgres will also work, though many some actions could give
 different or confusing results. e.g. last vacuum time will not be
 maintained for example, since no vacuum occurs on the standby.


yep.

 slony won't run on the standby either, so those checks won't work
 either.

  everyone will want some way to check how far the lag
  is on a stnadby machine.

 Agreed


right... my thought with slony was, what do people monitor on thier slony 
slaves?  (there is actually a nagios script for that iirc)

snip
 The Standby is a Clone and not a Slave. A Slave is a separate database
 that is forced to duplicate the actions of the Master. The Standby is an
 exact copy, in every detail that matters.


This is an interesting clarification. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-24 Thread Simon Riggs

On Wed, 2008-09-24 at 12:35 -0400, Robert Treat wrote:
 On Wednesday 24 September 2008 03:27:44 Simon Riggs wrote:
  On Wed, 2008-09-24 at 00:30 -0400, Robert Treat wrote:
   * However, some WAL redo actions will be for DDL actions. These DDL
   actions are repeating actions that have already committed on the primary
   node, so they must not fail on the standby node. These DDL locks take
   priority and will automatically cancel any read-only transactions that
   get in their way.
  
   Some people will want the option to stack-up ddl transactions behind
   long-running queries (one of the main use cases of a hot slave is
   reporting stye and other long running queries)
 
  Scheduling tools can help here. Run set of queries on Standby, then when
  complete run DDL on Primary and have its changes filter through.
 
 
 true... i am just reminded of Oracle 8's log replay, where you had to stop 
 replay to run any queries... given that was a usefull feature, I suspect 
 we'll hear complaints about it not going that way. I think one could argue 
 that we might be able to provide such an option in the future, if not in the 
 first release. 

It probably sounds worse than it is. If you drop a table on the Primary,
then somebody running a query against it on the Standby is in for a
shock. But on the other hand, why are you dropping a table that people
still consider worth reading? DROP TABLE should be carefully researched
before use, so I don't think its a big problem. 

We could make it wait for a while before cancelling, as an option, if
you think its important?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby Design

2008-09-23 Thread Robert Treat
On Tuesday 23 September 2008 14:15:34 Simon Riggs wrote:
 Hot Standby design has been growing on the PostgreSQL Wiki for some
 weeks now.

 I've updated the design to reflect all feedback received so far on
 various topics.

 http://wiki.postgresql.org/wiki/Hot_Standby

 It's not hugely detailed in every area, but it gives a flavour of the
 topics and issues related to them.

 Comments or questions welcome here, or I will discuss specific areas in
 more detail as I tackle those topics.


very nice work. very in depth. unfortunatly, this means it is long and the 
hour is late... so here are some scattered thoughts i had while reading it :

* However, some WAL redo actions will be for DDL actions. These DDL actions 
are repeating actions that have already committed on the primary node, so 
they must not fail on the standby node. These DDL locks take priority and 
will automatically cancel any read-only transactions that get in their way.

Some people will want the option to stack-up ddl transactions behind 
long-running queries (one of the main use cases of a hot slave is reporting 
stye and other long running queries)


* Actions not allowed on Standby are:
DML - Insert, Update, Delete, COPY FROM, Truncate

copy from suprised me a bit, since it is something i could see people wanting 
to do... did you mean COPY TO in this case?

* Statspack functions should work OK, so tools such as pgAdminIII should work. 
pgAgent will not.

I presume this means the backend kill function would work?  Also, can you go 
into why pgAgent would not work? (I presume it's because you can't update 
information that needs to be changed when jobs run, if thats the case it 
might be worth thinking about making pgAgent work across different clusters)

* Looking for suggestions about what monitoring capability will be required.

one place to start might be to think about which checks in check_nagios might 
still apply.  Possibly also looking to systems like slony for some 
guidence... for example everyone will want some way to check how far the lag 
is on a stnadby machine. 

* The following commands will not be accepted during recovery mode 
 GRANT, REVOKE, REASSIGN 

How is user management done on a standby? can you have users that dont exist 
on the primary (it would seem not). 

... more to come i'm sure, but fading out... thanks again for the work so far 
Simon. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hot standby system

2006-07-22 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-07-21 kell 20:00, kirjutas Chris Campbell:
 The documentation [1] says this about On-line backup and point-in- 
 time recovery:
 
  If we continuously feed the series of WAL files to another machine  
  that has been loaded with the same base backup file, we have a hot  
  standby system: at any point we can bring up the second machine  
  and it will have a nearly-current copy of the database.
 
 Is this possible today in a stable and robust way? If so, can we  
 document the procedure? If not, should we alter the documentation so  
 it's not misleading? I've had several people ask me where to enable  
 the hot standby feature, not realizing that PostgreSQL only has  
 some of the raw materials that could be used to architect such a thing.

We have written a management script that manages the setup und failover
procedures. It is a part of the SkyTools package we announced at the
postgresql conference in Toronto.

We failed to put it up on PgFoundry during the code sprint (it took a
whole day to enable the project there :( ) and have not had enough time
to do it after coming back. But it will be on PgFoundry eventually.

Marko has the details if you need it faster

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] hot standby system

2006-07-21 Thread Joshua D. Drake


Is this possible today in a stable and robust way? If so, can we 
document the procedure? If not, should we alter the documentation so 
it's not misleading? I've had several people ask me where to enable the 
hot standby feature, not realizing that PostgreSQL only has some of 
the raw materials that could be used to architect such a thing.


Well it works fine depending on how you set it up :) Please feel free to 
submit a patch to the docs.


Sincerely,

Joshua D. Drake




Thanks!

- Chris

[1] http://www.postgresql.org/docs/8.1/interactive/backup-online.html


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


<    5   6   7   8   9   10