Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Stefan Kaltenbrunner

On 12/30/2010 10:27 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 22:08 +0100, Stefan Kaltenbrunner wrote:

On 12/30/2010 10:01 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 15:51 -0500, Robert Treat wrote:


Still, one thing that has me concerned is that in the case of two
slaves, you don't know which one is the more up-to-date one if you
need to failover. It'd be nice if you could just guarantee they both
are...


Regrettably, nobody can know that, without checking.


how exactly would you check? - this seems like something that needs to
be done from the SQL and the CLI level and also very well documented
(which I cannot see in your proposal).


This is a proposal for sync rep, not multi-node failover. I'm definitely
not going to widen the scope of this project.

Functions already exist to check the thing you're asking.


well your proposal includes a lot of stuff on how to avoid dataloss and 
getting High Availability - so I think it is a requirement for us to 
tell the DBA what the procedures are for both setting it up (which is 
what is in the docs - but only 50% of the thing) and what to do in the 
case of a desaster (which is the other part of the problem).
Or said otherwise - sync rep is not very useful if there is no easy and 
reliable way to get that information, if that stuff is already available 
even better but I'm not aware of what is there and what not, so I expect 
others to have the same problem.




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Stefan Kaltenbrunner

On 12/30/2010 10:23 PM, Simon Riggs wrote:

On Thu, 2010-12-30 at 21:42 +0100, Stefan Kaltenbrunner wrote:


Synchronous replication offers the ability to guarantee that all changes
made by a transaction have been transferred to at least one remote
standby server. This is an extension to the standard level of durability
offered by a transaction commit. This is referred to as semi-synchronous
replication.

When synchronous replication is requested, the commit of a write
transaction will wait until confirmation that the commit record has been
transferred successfully to at least one standby server. Waiting for
confirmation increases the user's confidence that the changes will not
be lost in the event of server crashes but it also necessarily increases
the response time for the requesting transaction. The minimum wait time
is the roundtrip time from primary to standby.


hmm this is one of the main problems I see with the proposed master is
sometimes aware of the standby(as in the feedback mode) concept this
proposal has. If it waits for only one of the standbys there is some
issue with the terminology. As a DBA I would expect the master to only
return if ALL of the sync replication declared nodes replied ok.


Well, as a DBA, I expect it to work with just one. That's how MySQL and
Oracle work at least. If ALL standbys reply, it takes longer, makes the
code harder, how do you determine what all is robustly etc.. Plus its
been discussed already.


Maybe it has been discussed but I still don't see way it makes any 
sense. If I declare a standby a sync standby I better want it sync - not 
maybe sync. consider the case of a 1 master and two identical sync 
standbys - one sync standby is in the same datacenter the other is in a 
backup location say 15km away.
Given there is a small constant latency to the second box (even if you 
have fast networks) the end effect is that the second standby will NEVER 
be sync (because the local one will always be faster) and you end up 
with an async slave that cannot be used per your business rules?





What I'm really missing with that proposal is how people expect that
solution to be managed -


What aspect do you wish to monitor? I'm happy to consider your
suggestions.


given there is only sometimes a feedback
channel into the master you can't do the monitoring.


Not sure what you mean. Please explain more.


well hot_standby_feedback(not sure I like the name but I can't think of 
anything better either) - provides feedback to the master(like sync rep) 
and the master is actually acting on that feedback. so in effect we have 
stuff affecting the master that we need to be able to monitor clearly.
But to make that information useful it would help to see what standby 
provided what kind of feedback so we are back to having the master being 
aware of what standbys are connected and what standbys are supposed to 
be there - which is the issue sync rep got stalled before...





Even if you could (which we really need!) there is nothing in the
proposal yet that will help to determine on what the most recent standby
(in the case of more1 sync standby) might be.


Functions to determine that already exist.


  - but it would require a real standby
registration or at least standby management possibility on the master
not a halfway done one - so do we really need hot_standby_feedback as
part of the inital sync-rep patch?


It is a Hot Standby feature, but so tightly integrated with this code
that it isn't possible for me to submit as two separate patches.


well you are the developer of that feature but if it is already a know 
that can be turned on and off it seems not to hard to submit as a 
seperate feature...




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Stefan Kaltenbrunner

On 12/30/2010 10:45 PM, Heikki Linnakangas wrote:

On 30.12.2010 16:49, Florian Pflug wrote:

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.


We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?


Hmm, our definition of read-only is a bit fuzzy. While a transaction
doesn't modify the database itself, it could still send NOTIFYs or call
a PL function to do all sorts of things outside the database. Imagine
that you're paranoid about data integrity, and have a security definer
function that runs cross checks on the data. If it finds any
anomalities, it wakes up the operator or forces shutdown or similar.


are people actually doing that in reality? I'm also having a hard time 
picturing a realistic example of what that data integrity check 
function would actually being able to check with default isolation mode 
and concurrent activity...




Now a malicious user could set a snapshot that passes the basic validity
checks, ie. xmin = GlobalXmin, but contains a combination of still
in-progress that never existed in reality. If he then calls the
paranoia-function, it would see an inconsistent state of committed
tuples and get upset.


sure but I would expect being able to set a snapshot requiring either 
superuser or some sort of WITH SNAPSHOT grant thingy - and in general 
there are much more trivial and not that obscure scenarios a normal 
user can cause the admin to get paged :)




Maybe that's a bit far-stretched, but it's not entirely clear that
running with an inconsistent snapshot is harmless.


well there has been some discussion with to the SSI stuff that we might 
want to reconsider our definition of read-only maybe that would be the 
right way to approach the problem?



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-31 Thread Magnus Hagander
On Fri, Dec 31, 2010 at 03:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 Personally, my utility for the old repo is not much (if it was anything
 important, I wouldn't have relied on the unofficial repo). But we should
 probably give a little bit of warning for folks that might want to
 rebase or translate some old notes.

 Well, I guess the question is how much warning.  I suggested O(1 week)
 but Robert seems to want O(1 year).  As long as there's some agreed
 deadline, I'm not very picky about what it is.

I don't need to get rid of it *now*, but I agree a deadline is good.

How about we either say when 9.1 is released, or we say september
2011 because that's a year after we made the switch?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Heikki Linnakangas

On 31.12.2010 09:50, Hannu Krosing wrote:

On 30.12.2010 22:27, Robert Haas wrote:

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggssi...@2ndquadrant.com
wrote:

synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a success
indication to the client.

The word replicated here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL

Perhaps the level of replication guarantee should be decided on the
slave side, by
having a configuration parameter there

report_as_replicated = received|written_to_disk|fsynced|applied

for different types of hosts may have wildly different guarantees and
performance
parameters for these. One could envision a WAL-archive type standby
which is
there for data persistence only will and never apply WAL.


Agreed, it feels natural to specify when a piece of WAL is acknowledged 
in the standby.


Regarding the rest of the proposal, I would still prefer the UI 
discussed here:


http://archives.postgresql.org/message-id/4cae030a.2060...@enterprisedb.com

It ought to be the same amount of work to implement, and provides the 
same feature set, but makes administration a bit easier by being able to 
name the standbys. Also, I dislike the idea of having the standby 
specify that it's a synchronous standby that the master has to wait for. 
Behavior on the master should be configured on the master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 09:27 +0100, Stefan Kaltenbrunner wrote:

 Maybe it has been discussed but I still don't see way it makes any 
 sense. If I declare a standby a sync standby I better want it sync - not 
 maybe sync. consider the case of a 1 master and two identical sync 
 standbys - one sync standby is in the same datacenter the other is in a 
 backup location say 15km away.
 Given there is a small constant latency to the second box (even if you 
 have fast networks) the end effect is that the second standby will NEVER 
 be sync (because the local one will always be faster) and you end up 
 with an async slave that cannot be used per your business rules?

Your picture above is a common misconception. I will add something to
the docs to explain this.

1. sync is a guarantee about how we respond to the client when we
commit. If we wait for more than one response that slows things down,
makes the cluster more fragile, complicates the code and doesn't
appreciably improve the guarantee.

2. sync does not guarantee that the updates to the standbys are in any
way coordinated. You can run a query on one standby and get one answer
and at the exact same time run the same query on another standby and get
a different answer (slightly ahead/behind). That also means that if the
master crashes one of the servers will be ahead or behind. You can use
pg_last_xlog_receive_location() to check which one that is.

When people say they want *all* servers to respond, its usually because
they want (2), but that is literally impossible in a distributed system.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Stefan Kaltenbrunner

On 12/31/2010 11:06 AM, Heikki Linnakangas wrote:

On 31.12.2010 09:50, Hannu Krosing wrote:

On 30.12.2010 22:27, Robert Haas wrote:

On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggssi...@2ndquadrant.com
wrote:

synchronous_replication (boolean)
Specifies whether transaction commit will wait for WAL records
to be replicated before the command returns a success
indication to the client.

The word replicated here could be taken to mean different things,
most obviously:

- slave has received the WAL
- slave has fsync'd the WAL
- slave has applied the WAL

Perhaps the level of replication guarantee should be decided on the
slave side, by
having a configuration parameter there

report_as_replicated = received|written_to_disk|fsynced|applied

for different types of hosts may have wildly different guarantees and
performance
parameters for these. One could envision a WAL-archive type standby
which is
there for data persistence only will and never apply WAL.


Agreed, it feels natural to specify when a piece of WAL is acknowledged
in the standby.

Regarding the rest of the proposal, I would still prefer the UI
discussed here:

http://archives.postgresql.org/message-id/4cae030a.2060...@enterprisedb.com

It ought to be the same amount of work to implement, and provides the
same feature set, but makes administration a bit easier by being able to
name the standbys. Also, I dislike the idea of having the standby
specify that it's a synchronous standby that the master has to wait for.
Behavior on the master should be configured on the master.


well that proposal is much closer to what I want as an admin - except 
that it would be nice to configure that through actual DDL.

My wish would be more like:

* standby provides a unique name identifier
* standby has a flag to say the maximum(or minimum?) 
replication_reported support it can do
* standby connects to the master async by default and the master 
registers the standby automatically
* on the master I can do the following with every standby that is 
visible to the master or has been in the past:
	* enable/disable and add/remove permanently(if not added permanently 
the registration is transient) - enabled if not set explicitly
	* sync_rep_enabled (boolean) it (so you can still do per transaction or 
per database or whatever sync rep) - disabled if not set explicitly
	* sync_reply_required (booleant), (per sync standby flag to require a 
reply before the master returns - if there is only one sync standby this 
is default behaviour if there are more the admin can choose)

* wait_forever or timeout per standby
	* maybe a way to set the report_as_replicated from the master (if 
feasible) up to the max of what the standby can do


so you would get the proposed semi sync rep mode by simply setting 
more than one standby as sync_rep_enabled and sync_reply_required is 
false for all of them (ie any one of them can reply and the master 
returns) - if you want better than that just require a reply from a 
specific one or more than one.


this would also help in us providing a simple view with a nice and handy 
status report on the slaves (which ones are there, which ones should be 
there, how far are they in terms of applying wal, what status do they have).


Imho an interface like this would be:

a) convinient because it would not require any additional setup 
requirements for async rep except providing a name on the standby by 
default

b) it would enable the master to specify the business rules clearly
c) would still support the simple one sync reply is enough semisync 
replication case people like to have
d) would also enable the admin to get more than ONE sync standby that is 
really sync - not maybe sync
e) hot_standby_feedback (if enabled) would look at only the permanently 
enabled slaves so only an DBA approved standby would be able to affect 
the master for table bloat
f) provide the necessary meta information for providing the handy quick 
 nice replication status overview reporting feature people want and need
g) for all the permanently enabled async nodes we could keep track of 
the required oldest required WAL and keep that (optionally) so we could 
get rid of the hard to size max_keep_segements and maintain that 
automatically.


the only disadvantage I can see would be that you would have to manually 
remove a non-functional slave from the master(and only one that you set 
some explicit configuration for!) if you decide you don't need it any more.




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 12:06 +0200, Heikki Linnakangas wrote:
 On 31.12.2010 09:50, Hannu Krosing wrote:
  On 30.12.2010 22:27, Robert Haas wrote:
  On Thu, Dec 30, 2010 at 2:04 PM, Simon Riggssi...@2ndquadrant.com
  wrote:
  synchronous_replication (boolean)
  Specifies whether transaction commit will wait for WAL records
  to be replicated before the command returns a success
  indication to the client.
  The word replicated here could be taken to mean different things,
  most obviously:
 
  - slave has received the WAL
  - slave has fsync'd the WAL
  - slave has applied the WAL
  Perhaps the level of replication guarantee should be decided on the
  slave side, by
  having a configuration parameter there
 
  report_as_replicated = received|written_to_disk|fsynced|applied
 
  for different types of hosts may have wildly different guarantees and
  performance
  parameters for these. One could envision a WAL-archive type standby
  which is
  there for data persistence only will and never apply WAL.
 
 Agreed, it feels natural to specify when a piece of WAL is acknowledged 
 in the standby.

That can also be done, its not a problem.

Many people asked for just on or off.

Currently on -- slave has fsynced WAL.

 Also, I dislike the idea of having the standby 
 specify that it's a synchronous standby that the master has to wait for. 
 Behavior on the master should be configured on the master.

The parameter on the standby affects the behaviour of the standby. The
standby is saying don't pick me, I'm not a good target.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 12:06 +0200, Heikki Linnakangas wrote:

 Regarding the rest of the proposal, I would still prefer the UI 
 discussed here:
 
 http://archives.postgresql.org/message-id/4cae030a.2060...@enterprisedb.com
 
 It ought to be the same amount of work to implement, and provides the 
 same feature set, but makes administration a bit easier by being able to 
 name the standbys. Also, I dislike the idea of having the standby 
 specify that it's a synchronous standby that the master has to wait for. 
 Behavior on the master should be configured on the master.

Good point; I've added the people on the copy list from that post. This
question is they key, so please respond after careful thought on my
points below.

There are ways to blend together the two approaches, discussed later,
though first we need to look at the reasons behind my proposals.

I see significant real-world issues with configuring replication using
multiple named servers, as described in the link above:

1. Syncing to multiple standbys does not guarantee that the updates to
the standbys are in any way coordinated. You can run a query on one
standby and get one answer and at the exact same time run the same query
on another standby and get a different answer (slightly ahead/behind).
That also means that if the master crashes one of the servers can still
be ahead or behind, even though you asked them to be the same. So you
don't actually get what you think you're getting.

2. Availability of the cluster just went down. If *any* of the
important nodes goes down, then everything just freezes. (I accept
that you want that, and have provided that as an option).

3. Administrative complexity just jumped a huge amount. 

(a) If you add or remove servers to the config you need to respecify all
the parameters, which need to be specific to the exact set of servers.
There is no way to test that you have configured the parameters
correctly without a testbed that exactly mirrors production with same
names etc., or trying it in directly in production. So availability
takes another potential hit because of user error.

(b) After failover, the list of synchronous_standbys needs to be
re-specified, yet what is the correct list of servers? The only way to
make that config work is with complex middleware that automatically
generates new config files. I don't think that is the same amount of
work to implement, its an order of magnitude harder overall.

4. As a result of the administrative complexity, testing the full range
of function will take significantly longer and that is likely to have a
direct impact on the robustness of PostgreSQL 9.1.

5. Requesting sync from more than one server performs poorly, since you
must wait for additional servers. If there are sporadic or systemic
network performance issues you will be badly hit by them. Monitoring
that just got harder also. First-response-wins is more robust in the
case of volatile resources since it implies responsiveness to changing
conditions.

6. You just lost the ability to control performance on the master, with
a userset. Performance is a huge issue with sync rep. If you can't
control it, you'll simply turn it off. Having a feature that we daren't
ever use because it performs poorly helps nobody. This is not a tick-box
in our marketing checklist, I want it to be genuinely real-world usable.

I understand very well that Oracle provides that level of configuration,
though I think it is undesirable in 90% of real world use cases. I also
understand how sexy that level of configuration *sounds*, but I
genuinely believe trying to deliver that would be a mistake for
PostgreSQL. IMHO we should take the same road here as we do in other
things: simplicity encouraged, complexity allowed. So I don't have any
objection to supporting that functionality in the future, but I believe
it is not something we should be encouraging (ever), nor is it something
we need for this release.

I suppose we might regard the feature set I am proposing as being the
same as making synchronous_standbys a USERSET parameter, and allowing
just two options: 
none - allowing the user to specify async if they wish it
* - allowing people to specify that syncing to *any* standby is
acceptable

We can blend the two approaches together, if we wish, by having two
parameters (plus server naming)
  synchronous_replication = on | off (USERSET)
  synchronous_standbys = '...'
If synchronous_standbys is not set and synchronous_replication = on then
we sync to any standby. If  synchronous_replication = off then we use
async replication, whatever synchronous_standbys is set to.
If synchronous_standbys is set, then we use sync rep to all listed
servers.

My proposal amounts to lets add synchronous_standbys as a parameter in
9.2. If you really think that we need that functionality in this
release, lets get the basic stuff added now and then fold in those ideas
on top afterwards. If we do that, I will help. However, my only
insistence is that we 

Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Thu, 2010-12-30 at 20:26 -0700, Joshua Tolley wrote:
 2) initiate fsync on the primary first
 - In this case, the slave is always slightly behind.  If if your
  primary falls over, you don't give commit messages to the clients,
 but
  if it recovers, it might have committed data, and slaves will still
 be
  able to catch up.
  
  The thing is that currently, even without replication, #2 can
 happen.
 
 For what little it's worth, I vote for this option, because it's a
 problem that can already happen (as opposed to adding an entirely new
 type of problem to the mix).

This proposal provides #2, so your wish is met.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Old git repo

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 4:58 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Dec 31, 2010 at 03:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 Personally, my utility for the old repo is not much (if it was anything
 important, I wouldn't have relied on the unofficial repo). But we should
 probably give a little bit of warning for folks that might want to
 rebase or translate some old notes.

 Well, I guess the question is how much warning.  I suggested O(1 week)
 but Robert seems to want O(1 year).  As long as there's some agreed
 deadline, I'm not very picky about what it is.

 I don't need to get rid of it *now*, but I agree a deadline is good.

 How about we either say when 9.1 is released, or we say september
 2011 because that's a year after we made the switch?

Either of those would be fine with me.

Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Aidan Van Dyk
On Fri, Dec 31, 2010 at 5:26 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Your picture above is a common misconception. I will add something to
 the docs to explain this.

 2. sync does not guarantee that the updates to the standbys are in any
 way coordinated. You can run a query on one standby and get one answer
 and at the exact same time run the same query on another standby and get
 a different answer (slightly ahead/behind). That also means that if the
 master crashes one of the servers will be ahead or behind. You can use
 pg_last_xlog_receive_location() to check which one that is.

 When people say they want *all* servers to respond, its usually because
 they want (2), but that is literally impossible in a distributed system.

Just to try and be clear again, in sync that Stefan and I are
talking about, we really don't care that the slave could be a hot
standby answering queries.  In fact, mine wouldn't be. Mine would
likely be pg_streamrecv or something.   I'm just looking for a
guarantee that I've got a copy of the data safely in the next rack,
and a separate building before I tell the client I've moved his money.

I want a synchronous replication of the *data*, and not a system where
I can distribute queries.  I'm looking for disaster mitigation, not
load mitigation.  A replacement for clustered/replicated
devices/filesystems under pg_xlog.

Having the next rack slave be hot in terms of applying WAL and ready
to take over instantly would be a bonus, as long as I can guarantee
it's current (i.e has all data a primary's COMMIT has acknowledged).

So, that's what I want, and that's what your docs suggest is
impossible currently; 1st past post means that I can only ever
reliably configure 1 sync slave and be sure it will have all
acknowledged commits.  I can likely get *close* to that by putting
only my slowest slave as the only sync slave, and monitoring the
heck out of my asynchronous but I want to be synchronous slave, but
I'ld rather trust the PG community to build robust synchronization
than myself to build robust enough monitoring to catch that my slave
is farther behind than the slower synchronous one.


That said, I think the expectation is that if I were building a
query-able hot standby cluster in sync rep mode, once I get a commit
confirmation, I should be able to then initiate a new transaction on
any member of that sync rep cluster and see the data I just committed.

Yes, I know I could see *newer* data.  And I know that the primary
could already have newer data. Yes, we have the problem even on a
single pg cluster on a single machine.  But the point is that if
you've committed, any new transactions see *at least* that data or
newer.  But no chance of older.

But personally, I'm not interested in that ;-)
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 6:48 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I suppose we might regard the feature set I am proposing as being the
 same as making synchronous_standbys a USERSET parameter, and allowing
 just two options:
 none - allowing the user to specify async if they wish it
 * - allowing people to specify that syncing to *any* standby is
 acceptable

 We can blend the two approaches together, if we wish, by having two
 parameters (plus server naming)
  synchronous_replication = on | off (USERSET)
  synchronous_standbys = '...'
 If synchronous_standbys is not set and synchronous_replication = on then
 we sync to any standby. If  synchronous_replication = off then we use
 async replication, whatever synchronous_standbys is set to.
 If synchronous_standbys is set, then we use sync rep to all listed
 servers.

 My proposal amounts to lets add synchronous_standbys as a parameter in
 9.2.

FWIW, this plan of attack would be OK with me.  I had taken your
previous comments to imply that you were opposed to EVER having a
parameter like this on the master, but I could see deferring it.  In
my ideal world, we'd not have the synchronous_replication_service
parameter on the standbys at all - all standbys would be candidates,
unless someone decides to name them and set synchronous_standbys on
the master.  But maybe having a simple on/off on the standby is
reasonable.

*thinks a little bit*

Someone may have proposed this before, but one way of getting standby
naming for free would be to make the standby names the same as the
roles used to log in, rather than adding a separate parameter.  We
could just recommend to people that they use a separate, descriptive
role name for each standby.  Then the synchronous_standbys parameter -
when added - would mean a standby from one of these roles.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Heikki Linnakangas

On 31.12.2010 13:48, Simon Riggs wrote:

On Fri, 2010-12-31 at 12:06 +0200, Heikki Linnakangas wrote:


Regarding the rest of the proposal, I would still prefer the UI
discussed here:

http://archives.postgresql.org/message-id/4cae030a.2060...@enterprisedb.com

It ought to be the same amount of work to implement, and provides the
same feature set, but makes administration a bit easier by being able to
name the standbys. Also, I dislike the idea of having the standby
specify that it's a synchronous standby that the master has to wait for.
Behavior on the master should be configured on the master.


Good point; I've added the people on the copy list from that post. This
question is they key, so please respond after careful thought on my
points below.

There are ways to blend together the two approaches, discussed later,
though first we need to look at the reasons behind my proposals.

I see significant real-world issues with configuring replication using
multiple named servers, as described in the link above:


All of these points only apply to specifying *multiple* named servers in 
the synchronous_standbys='...' list. That's certainly a more complicated 
scenario, and the configuration is more complicated as a result. With 
your proposal, it's not possible in the first place.


Multiple synchronous standbys probably isn't needed by most people, so 
I'm fine with leaving that out for now, keeping the design the same 
otherwise. I included it in the proposal because it easily falls out of 
the design. So, if you're worried about the complexities of multiple 
synchronous standbys, let's keep the UI exactly the same as what I 
described in the link above, but only allow one name in the 
synchronous_standbys setting, instead of a list.



3. Administrative complexity just jumped a huge amount.

(a) If you add or remove servers to the config you need to respecify all
the parameters, which need to be specific to the exact set of servers.


Hmm, this could be alleviated by allowing the master to have a name too. 
All the configs could then be identical, except for the unique name for 
each server. For example, for a configuration with three servers that 
are all synchronous with each other, each server would have 
synchronous_standbys='server1, server2, server3' in the config file. 
The master would simply ignore the entry for itself.



(b) After failover, the list of synchronous_standbys needs to be
re-specified, yet what is the correct list of servers? The only way to
make that config work is with complex middleware that automatically
generates new config files.


It depends on what you want. I think you're envisioning that the 
original server is taken out of the system and not waited for, meaning 
that you accept a lower level of persistence after failover. Yes, then 
you need to change the config. Or more likely you prepare the config 
file in the standby that way to begin with.



I don't think that is the same amount of
work to implement, its an order of magnitude harder overall.


I meant it's the same amount of work to implement the feature in 
PostgreSQL. No doubt that maintaining such a setup in production is more 
complicated.



5. Requesting sync from more than one server performs poorly, since you
must wait for additional servers. If there are sporadic or systemic
network performance issues you will be badly hit by them. Monitoring
that just got harder also. First-response-wins is more robust in the
case of volatile resources since it implies responsiveness to changing
conditions.

6. You just lost the ability to control performance on the master, with
a userset. Performance is a huge issue with sync rep. If you can't
control it, you'll simply turn it off. Having a feature that we daren't
ever use because it performs poorly helps nobody. This is not a tick-box
in our marketing checklist, I want it to be genuinely real-world usable.


You could make synchronous_standbys a user-settable GUC, just like your 
proposed boolean switch. You could then control on a per-transaction 
basis which servers you want to wait to respond. Although perhaps it 
would be more user-friendly to just have an additional boolean GUC, 
similar to synchronous_commit=on/off. Or maybe synchronous_commit is 
enough to control that.



I suppose we might regard the feature set I am proposing as being the
same as making synchronous_standbys a USERSET parameter, and allowing
just two options:
none - allowing the user to specify async if they wish it
* - allowing people to specify that syncing to *any* standby is
acceptable

We can blend the two approaches together, if we wish, by having two
parameters (plus server naming)
   synchronous_replication = on | off (USERSET)
   synchronous_standbys = '...'
If synchronous_standbys is not set and synchronous_replication = on then
we sync to any standby. If  synchronous_replication = off then we use
async replication, whatever synchronous_standbys is set to.
If synchronous_standbys is 

Re: [HACKERS] Old git repo

2010-12-31 Thread Magnus Hagander
On Fri, Dec 31, 2010 at 13:10, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 31, 2010 at 4:58 AM, Magnus Hagander mag...@hagander.net wrote:
 On Fri, Dec 31, 2010 at 03:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeff Davis pg...@j-davis.com writes:
 Personally, my utility for the old repo is not much (if it was anything
 important, I wouldn't have relied on the unofficial repo). But we should
 probably give a little bit of warning for folks that might want to
 rebase or translate some old notes.

 Well, I guess the question is how much warning.  I suggested O(1 week)
 but Robert seems to want O(1 year).  As long as there's some agreed
 deadline, I'm not very picky about what it is.

 I don't need to get rid of it *now*, but I agree a deadline is good.

 How about we either say when 9.1 is released, or we say september
 2011 because that's a year after we made the switch?

 Either of those would be fine with me.

Let's just decide it's when 9.1 is released.

And I put it on you to remind me when the time comes ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Heikki Linnakangas

On 31.12.2010 14:40, Robert Haas wrote:

Someone may have proposed this before, but one way of getting standby
naming for free would be to make the standby names the same as the
roles used to log in, rather than adding a separate parameter.  We
could just recommend to people that they use a separate, descriptive
role name for each standby.  Then the synchronous_standbys parameter -
when added - would mean a standby from one of these roles.


Seems a bit weird. It's not a lot of effort to give each standby a name. 
But if you want something automatic, how about gethostname() ?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
Happy new year fellow pgsql-hackers!

This is the first alpha release of a new hopefully quite interesting little
tool, named snapshot.

Feedback welcomed.

-- 
Best regards,

Joel Jacobson
Glue Finance



URL

https://github.com/gluefinance/snapshot



DESCRIPTION

Take a snapshot or rollback all your stored procedures in your PostgreSQL
database.



RATIONALE

Before reading any further, ask yourselves the following questions.

1.  Have you ever,
a)  modified stored procedures in your production database and
b)  thought it went OK because all your tests passed and
c)  later on realized something is wrong and
d)  not being able to find nor fix the bug immediately
leaving you no other option than to do a revert?
If so, go to step 2.
If not, go to step 4.

2.  During the minutes/hours while your malfunctional patch made a mess
in the production database, was there any user activity causing
important
writes to the database?
If so, go to step 3.
If not, go to step 4.

3.  Did you enjoy the revert experience in step 1?
If so, go to step 4.
If not, go to step 5.

4. Are any of the following statements TRUE?
a) your application is not very database centric.
b) your users won't stop using your service if you lose their data.
c) your application is read-only.
d) your application does not have a lot of user traffic.
If so, lucky you!
If not, you probably have a good solution to my problem already,
I would highly appreciate if you wanted to share it with me,
please contact me at j...@gluefinance.com.

5.  This proposed solution might be interesting for you.
I would highly appreciate your feedback on how to improve it,
please contact me at j...@gluefinance.com.



INTRODUCTION

snapshot can take a snapshot of all your database functions and objects
depending on them, such as constraints and views using functions.

snapshot can rollback to a previous snapshot without modifying any of your
data or tables. It will only execute the minimum set of drop/create commands
to carry out the rollback.

snapshot depends on the pgcrypto contrib package.



TERMINOLOGY

object type objects of the same type are created and dropped the same
way,
i.e. they use the same functions to build proper create and
drop SQL-commands.

object  is of an object type and has a SHA1 hash of its content
consisting of two SQL-commands, one to create and another to
drop the object.

revisionhas a timestamp when it was created and a list of objects

snapshothas a timestamp when it was taken and has a revision

active snapshot the last snapshot taken

take snapshot   create a new revision of all objects currently live in the
database and then create a new snapshot if the revision
is different compared to the active snapshot.

rollbackrestores a previously taken snapshot



SYNOPSIS

-- 1. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   1 |   1
(1 row)


-- 2. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   1 |   1
(1 row)


-- 3. We notice nothing changed between step 1 and 2.


-- 4. Modify your functions.

postgres=# CREATE FUNCTION myfunc() RETURNS VOID AS $$ $$ LANGUAGE sql;
CREATE FUNCTION
glue=# \df myfunc
 List of functions
 Schema |  Name  | Result data type | Argument data types |  Type
++--+-+
 public | myfunc | void | | normal
(1 row)


-- 5. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   2 |   2
(1 row)


-- 4. Rollback to snapshot 1.

postgres=# SELECT * FROM snapshot(1);
 _snapshotid | _revisionid
-+-
   3 |   1
(1 row)


-- 5. We notice the function we created in step 4 has been dropped.

postgres=# \df myfunc
   List of functions
 Schema | Name | Result data type | Argument data types | Type
+--+--+-+--
(0 rows)


-- 6. Rollback to snapshot 2.

postgres=# SELECT * FROM snapshot(2);
 _snapshotid | _revisionid
-+-
   4 |   2
(1 row)


-- 7. We notice the function we created in step 4 has been created.

postgres=# \df myfunc
 List of functions
 Schema |  Name  | Result data type | Argument data types |  Type
++--+-+
 public | myfunc | void  

Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-31 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie dic 31 02:07:18 -0300 2010:

 I think that's true in some cases but not all.  The system-generated
 attribute names thing actually applies in several cases, and I think
 it's pretty cut-and-dried.  When you get into something like which
 kinds of relations support triggers, that's a lot more arbitrary.

I think part of the problem with the phrase system-generated attribute
names is: how are users supposed to figure out what that means, and
what relation types it applies to?  It seems entirely non-obvious.

 I think for now what I
 had better do is try to get this SQL/MED patch finished up by
 soldiering through this mess rather than trying to fix it.  I think
 it's going to be kind of ugly, but we haven't got another plan then
 we're just going to have to live with the ugliness.

Perhaps it would make sense to fix the cases for which there is a
consensus, and leave the rest alone for now.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of vie dic 31 00:15:57 -0300 2010:
 On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Disadvantage of b: It doesn't allow a snapshot to be installed on a
  different server. It requires a serializable open transaction to hold
  the snapshot.
 
  Why does it require a serializable transaction?  You could simply
  register the snapshot in any transaction.  (Of course, the net effect
  would be pretty similar to a serializable transaction).
 
 I am not assuming that the publishing transaction blocks until its
 snapshot is being picked up. A read committed transaction would get a
 new snapshot for every other query, so the published snapshot is no
 longer represented by an actual backend until it is being picked up by
 one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum
 would remove tuples that the published-but-not-yet-picked-up snapshot
 should still be able to see, no?

A backend can have any number of snapshots registered, and those don't
allow GlobalXmin to advance.  Consider an open cursor, for example.
Even if the rest of the transaction is read committed, the snapshot
registered by the open cursor still holds back GlobalXmin.  My
(handwavy) idea is that whenever the transaction calls
pg_publish_snapshot(), said snapshot is registered, which makes it safe
to use even if the transaction continues to operate and obtain newer
snapshots.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-31 Thread Alvaro Herrera
Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  I was thinking that we could have two different ANALYZE modes, one
  full and one incremental; autovacuum could be modified to use one or
  the other depending on how many changes there are (of course, the user
  could request one or the other, too; not sure what should be the default
  behavior).
 
 How is an incremental ANALYZE going to work at all?  It has no way to
 find out the recent changes in the table, for *either* inserts or
 deletes.  Unless you want to seqscan the whole table looking for tuples
 with xmin later than something-or-other ... which more or less defeats
 the purpose.

Yeah, I was thinking that this incremental ANALYZE would be the stream
in the stream-based estimator but evidently it doesn't work that way.
The stream that needs to be passed to the estimator consists of new
tuples as they are being inserted into the table, so this would need to
be done by the inserter process ... or it'd need to transmit the CTIDs
for someone else to stream them ... not an easy thing, in itself.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 8:10 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 I think for now what I
 had better do is try to get this SQL/MED patch finished up by
 soldiering through this mess rather than trying to fix it.  I think
 it's going to be kind of ugly, but we haven't got another plan then
 we're just going to have to live with the ugliness.

 Perhaps it would make sense to fix the cases for which there is a
 consensus, and leave the rest alone for now.

Sure.  Which cases do we have consensus on?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 7:57 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 31.12.2010 14:40, Robert Haas wrote:

 Someone may have proposed this before, but one way of getting standby
 naming for free would be to make the standby names the same as the
 roles used to log in, rather than adding a separate parameter.  We
 could just recommend to people that they use a separate, descriptive
 role name for each standby.  Then the synchronous_standbys parameter -
 when added - would mean a standby from one of these roles.

 Seems a bit weird. It's not a lot of effort to give each standby a name. But
 if you want something automatic, how about gethostname() ?

Uh, wow, no way.  That would be making a critical aspect of system
reliability depend on something way, way outside of our configuration.

What's weird about using the role name?  That's our standard way of
distinguishing between two or more users.  Why invent something new?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-31 Thread Peter Eisentraut
On tor, 2010-12-30 at 11:03 -0500, Robert Haas wrote:
 No, quite the opposite.  With the other approach, you needed:
 
 constraints cannot be used on views
 constraints cannot be used on composite types
 constraints cannot be used on TOAST tables
 constraints cannot be used on indexes
 constraints cannot be used on foreign tables
 
 With this, you just need:
 
 constraints can only be used on tables

At the beginning of this thread you said that the error messages should
focus on what you tried to do, not what you could do instead.

Also, in this particular case, the user could very well assume that a
TOAST table or a foreign table is a table.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Stefan Kaltenbrunner

On 12/31/2010 02:39 PM, Robert Haas wrote:

On Fri, Dec 31, 2010 at 7:57 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 31.12.2010 14:40, Robert Haas wrote:


Someone may have proposed this before, but one way of getting standby
naming for free would be to make the standby names the same as the
roles used to log in, rather than adding a separate parameter.  We
could just recommend to people that they use a separate, descriptive
role name for each standby.  Then the synchronous_standbys parameter -
when added - would mean a standby from one of these roles.


Seems a bit weird. It's not a lot of effort to give each standby a name. But
if you want something automatic, how about gethostname() ?


Uh, wow, no way.  That would be making a critical aspect of system
reliability depend on something way, way outside of our configuration.


+1



What's weird about using the role name?  That's our standard way of
distinguishing between two or more users.  Why invent something new?


wel a user is not a host/server for me - given there is no real benefit 
from using distinct roles for each standby yet I don't see why we should 
complicate the replication setup procedure by requiring 1 role per standby.
So I'm all for giving each standby a name but please make it an explicit 
one and not something that is only vaguely related to the actual standby 
host.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-31 Thread Peter Eisentraut
On tor, 2010-12-30 at 11:49 -0500, Tom Lane wrote:
 ISTM there are four things we might potentially want to state in the
 error message: the feature/operation you tried to apply, the name of
 the object you tried to apply it to, the type of that object, and the
 set of object types that the feature/operation will actually work for.

I think the latter should be completely omitted unless it's
exceptionally important.

You can construct pretty silly things down this line:

ERROR:  permission denied for relation x
ERROR:  relation x does not exist

vs.

ERROR:  you only have permission on relation a, b, c
ERROR:  only the following relations exist: a, b, c



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 8:48 AM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 What's weird about using the role name?  That's our standard way of
 distinguishing between two or more users.  Why invent something new?

 wel a user is not a host/server for me - given there is no real benefit from
 using distinct roles for each standby yet I don't see why we should
 complicate the replication setup procedure by requiring 1 role per standby.
 So I'm all for giving each standby a name but please make it an explicit one
 and not something that is only vaguely related to the actual standby host.

OK.  If that's the consensus, can someone post a patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming replication as a separate permissions

2010-12-31 Thread Magnus Hagander
On Thu, Dec 30, 2010 at 15:54, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2010-12-29 at 11:09 +0100, Magnus Hagander wrote:
 I've applied this version (with some minor typo-fixes).

 This page is now somewhat invalidated:

 http://developer.postgresql.org/pgdocs/postgres/role-attributes.html

Hmm. Somehow I missed that page completely when looking through the
docs. I'll go update that.


 First, it doesn't mention the replication privilege, and second it
 continues to claim that superuser status bypasses all permission checks.

Well, that was *already* wrong.

superuser doesn't bypass NOLOGIN.

That doesn't mean it shouldn't be fixed, but that's independent of the
replication role.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Joachim Wieland
On Fri, Dec 31, 2010 at 8:28 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 A backend can have any number of snapshots registered, and those don't
 allow GlobalXmin to advance.  Consider an open cursor, for example.
 Even if the rest of the transaction is read committed, the snapshot
 registered by the open cursor still holds back GlobalXmin.  My
 (handwavy) idea is that whenever the transaction calls
 pg_publish_snapshot(), said snapshot is registered, which makes it safe
 to use even if the transaction continues to operate and obtain newer
 snapshots.

Cool, even better that this is taken care of already :-)

Thanks,
Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 07:33 -0500, Aidan Van Dyk wrote:
 On Fri, Dec 31, 2010 at 5:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
 
  Your picture above is a common misconception. I will add something to
  the docs to explain this.
 
  2. sync does not guarantee that the updates to the standbys are in any
  way coordinated. You can run a query on one standby and get one answer
  and at the exact same time run the same query on another standby and get
  a different answer (slightly ahead/behind). That also means that if the
  master crashes one of the servers will be ahead or behind. You can use
  pg_last_xlog_receive_location() to check which one that is.
 
  When people say they want *all* servers to respond, its usually because
  they want (2), but that is literally impossible in a distributed system.
 
 Just to try and be clear again, in sync that Stefan and I are
 talking about, we really don't care that the slave could be a hot
 standby answering queries.  In fact, mine wouldn't be. Mine would
 likely be pg_streamrecv or something.   I'm just looking for a
 guarantee that I've got a copy of the data safely in the next rack,
 and a separate building before I tell the client I've moved his money.
 
 I want a synchronous replication of the *data*, and not a system where
 I can distribute queries.  I'm looking for disaster mitigation, not
 load mitigation.  A replacement for clustered/replicated
 devices/filesystems under pg_xlog.
 
 Having the next rack slave be hot in terms of applying WAL and ready
 to take over instantly would be a bonus, as long as I can guarantee
 it's current (i.e has all data a primary's COMMIT has acknowledged).

 So, that's what I want, and that's what your docs suggest is
 impossible currently; 1st past post means that I can only ever
 reliably configure 1 sync slave and be sure it will have all
 acknowledged commits.  I can likely get *close* to that by putting
 only my slowest slave as the only sync slave, and monitoring the
 heck out of my asynchronous but I want to be synchronous slave, but
 I'ld rather trust the PG community to build robust synchronization
 than myself to build robust enough monitoring to catch that my slave
 is farther behind than the slower synchronous one.

 That said, I think the expectation is that if I were building a
 query-able hot standby cluster in sync rep mode, once I get a commit
 confirmation, I should be able to then initiate a new transaction on
 any member of that sync rep cluster and see the data I just committed.

 Yes, I know I could see *newer* data.  And I know that the primary
 could already have newer data. Yes, we have the problem even on a
 single pg cluster on a single machine.  But the point is that if
 you've committed, any new transactions see *at least* that data or
 newer.  But no chance of older.
 
 But personally, I'm not interested in that ;-)

I understand your requirements, listed above.

There are good technical reasons why trying to achieve *all* of the
above lets slip the other unstated requirements of availability,
complexity, performance etc.. 

Inventing parameter combinations merely hides the fact that these things
aren't all simultaneously achievable. In light of that, I have been
espousing a simple approach to the typical case, and for the first
release. I can see that people may assume my words have various other
reasons behind them, but that's not the case. If I could give it all to
you, I would.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread David E. Wheeler
On Dec 31, 2010, at 5:00 AM, Joel Jacobson wrote:

 Happy new year fellow pgsql-hackers!
 
 This is the first alpha release of a new hopefully quite interesting little
 tool, named snapshot.
 
 Feedback welcomed.

This looks awesome, Joel! One question: Why the dependence on pg_crypto? If 
it's just for SHA1 support, and you're just using it to to create hashes of 
function bodies, I suspect that you could also use the core MD5() function, yes?

Looks really cool.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
2010/12/31 David E. Wheeler da...@kineticode.com

 This looks awesome, Joel! One question: Why the dependence on pg_crypto? If
 it's just for SHA1 support, and you're just using it to to create hashes of
 function bodies, I suspect that you could also use the core MD5() function,
 yes?


Thanks for fast reply. My guests hate me becuase I had to escape from the
dinner party, but I simply couldn't wait a whole year fixing this bug.
Commit.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread David E. Wheeler
On Dec 31, 2010, at 10:15 AM, Joel Jacobson wrote:

 2010/12/31 David E. Wheeler da...@kineticode.com
 This looks awesome, Joel! One question: Why the dependence on pg_crypto? If 
 it's just for SHA1 support, and you're just using it to to create hashes of 
 function bodies, I suspect that you could also use the core MD5() function, 
 yes?
 
 Thanks for fast reply. My guests hate me becuase I had to escape from the 
 dinner party, but I simply couldn't wait a whole year fixing this bug.

lol! Go enjoy the party. There was no rush -- and no bug! :-)

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 14:00 +0100, Joel Jacobson wrote:

 This is the first alpha release of a new hopefully quite interesting
 little tool, named snapshot.

Please call it something other than snapshot. There's already about 3
tools called something similar and a couple of different meanings of the
term in the world of Postgres.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Simon Riggs
On Fri, 2010-12-31 at 14:40 +0200, Heikki Linnakangas wrote:
 On 31.12.2010 13:48, Simon Riggs wrote:
 
  I see significant real-world issues with configuring replication using
  multiple named servers, as described in the link above:
 
 All of these points only apply to specifying *multiple* named servers in 
 the synchronous_standbys='...' list.

Unfortunately, some of the points apply to using named servers ever,
even if there is only one.

  That's certainly a more complicated 
 scenario, and the configuration is more complicated as a result. 
 With your proposal, it's not possible in the first place.
 
 Multiple synchronous standbys probably isn't needed by most people, so 
 I'm fine with leaving that out for now, keeping the design the same 
 otherwise. I included it in the proposal because it easily falls out of 
 the design. So, if you're worried about the complexities of multiple 
 synchronous standbys, let's keep the UI exactly the same as what I 
 described in the link above, but only allow one name in the 
 synchronous_standbys setting, instead of a list.

The best usage recommendation is still to have 2+ standbys, *any* of
which can be used to provide sync rep. That is the best performance,
best availability and easiest to configure that I know of. That best
usage is not achievable with uniquely named servers; using non-unique
names defeats the point of having names in the first place.

I accept that the best usage is a general case and there may be
circumstances that make the difficulties of named servers worth the
trouble.

So replicating to multiple synchronous standbys is definitely needed in
this release. *Confirming* replication to multiple named sync standbys
is the thing we don't need in this release.

  I suppose we might regard the feature set I am proposing as being the
  same as making synchronous_standbys a USERSET parameter, and allowing
  just two options:
  none - allowing the user to specify async if they wish it
  * - allowing people to specify that syncing to *any* standby is
  acceptable
 
  We can blend the two approaches together, if we wish, by having two
  parameters (plus server naming)
 synchronous_replication = on | off (USERSET)
 synchronous_standbys = '...'
  If synchronous_standbys is not set and synchronous_replication = on then
  we sync to any standby. If  synchronous_replication = off then we use
  async replication, whatever synchronous_standbys is set to.
  If synchronous_standbys is set, then we use sync rep to all listed
  servers.
 
 Sounds good.

 I still don't like the synchronous_standbys='' and 
 synchronous_replication=on combination, though. 
 IMHO that still amounts 
 to letting the standby control the behavior on master, and it makes it 
 impossible to temporarily add an asynchronous standby to the mix. I 
 could live with it, you wouldn't be forced to use it that way after all, 
 but I would still prefer to throw an error on that combination. Or at 
 least document the pitfalls and recommend always naming the standbys.

We need a parameter set that makes the best practice easy/easiest to
specify, and yet more complicated configurations possible. So I'm happy
to add synchronous_standbys parameter, as long as it is possible to
specify any (for which I would use *), which would be the default.
Initially that would be restricted to just one name.

Will pass the server name as an option after IDENTIFY SYSTEM name.

Anyway, lets continue the discussion next year.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
2010/12/31 Simon Riggs si...@2ndquadrant.com

 Please call it something other than snapshot. There's already about 3
 tools called something similar and a couple of different meanings of the
 term in the world of Postgres.


Thanks, good point.
Renamed to fsnapshot.
Commit.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread pasman pasmański
Hi. Will be useful to add a column with timestamp of the revision and
a comment can you do it? not today in order that your friends dont
kill you ..

-- 
Sent from my mobile device


pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Hannu Krosing

On 31.12.2010 13:40, Heikki Linnakangas wrote:


Sounds good.

I still don't like the synchronous_standbys='' and 
synchronous_replication=on combination, though. IMHO that still 
amounts to letting the standby control the behavior on master, and it 
makes it impossible to temporarily add an asynchronous standby to the mix.
A sync standby _will_have_ the ability to control the master anyway by 
simply being there or not.


What is currently proposed is having dual power lines / dual UPS' and 
working happily on when one of them fails.
Requiring both of them to be present defeats the original purpose  of 
doubling them.


So following Simons design of 2 standbys and only one required to ACK to 
commit you get 2X reliability of single standby.


In a design where you have 2 standbys and both are required to ACK to 
commit you get only 1/2 the reliability of single standby.


Having a list of 10 standbys and requiring ACK from all, you get only 
10% of the reliability.


I agree that there can be scenarios where you may need 10 sync copies 
before committing on master - usually for non-technical reasons like 
some accounting law or whatever - these are far rarer than requirement 
to have reasonable performance and 99.999% system uptime when using only 
99.99% reliable hardware. And in such cases where you need multiple 
copies you will need some out-of-database technology (like signed 
timestamps) to make the data non-falsifiable as well, so you can't solve 
this with just configuring sync rep.


I could live with it, you wouldn't be forced to use it that way after 
all, but I would still prefer to throw an error on that combination. 
Or at least document the pitfalls and recommend always naming the 
standbys.



My proposal amounts to lets add synchronous_standbys as a parameter in
9.2. If you really think that we need that functionality in this
release, lets get the basic stuff added now and then fold in those ideas
on top afterwards. If we do that, I will help. However, my only
insistence is that we explain the above points very clearly in the docs
to specifically dissuade people from using those features for typical
cases.


Huh, wait, if you leave out synchronous_standbys, that's a completely 
different UI again. I think we've finally reached agreement on how 
this should be configured, let's stick to that, please.


(I would be fine with limiting synchronous_standbys to just one server 
in this release though.)



If you wondered why I ignored your post previously, its because I
understood that Fujii's post of 15 Oct, one week later, effectively
accepted my approach, albeit with two additional parameters. That is the
UI that I had been following.
http://archives.postgresql.org/pgsql-hackers/2010-10/msg01009.php


That thread makes no mention of how to specify which standbys are 
synchronous and which are not.
The simplest way would be to have separate database users for sync and 
async standbys ?


That would allow any standby with right credentials act as a sync user, 
and those who are not eligible are not accepted even if they try to act 
as a synchronity (?) provider.


It's about specifying the timeout and whether to wait for a 
disconnected standby. Yeah, Fujii-san's proposal seems reasonable for 
configuring that.




Hannu Krosing


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item for pg_ctl and server detection

2010-12-31 Thread Bruce Momjian
Bruce Momjian wrote:
 Bruce Momjian wrote:
  Yes, that was my calculus too.  I realized that we create session ids by
  merging the process id and backend start time, so I went ahead and added
  the postmaster start time epoch to the postmaster.pid file.  While there
  is no way to pass back the postmaster start time from PQping, I added
  code to pg_ctl to make sure the time in the postmaster.pid file is not
  _before_ pg_ctl started running.  We only check PQping() after we have
  started the postmaster ourselves, so it fits our needs.
 
 Tom suggested that there might be clock skew between pg_ctl and the
 postmaster, so I added a 2-second slop in checking the postmaster start
 time.  Tom also wanted the connection information to be output all at
 once, but that causes a problem with detecting pre-9.1 servers so I
 avoided it.

Patch applied, and TODO item removed because patch mostly detects if a
stale postmaster created the postmaster.pid file.  The TODO was:

Allow pg_ctl to work properly with configuration files located outside
the PGDATA directory)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/python refactoring

2010-12-31 Thread Peter Eisentraut
On tor, 2010-12-23 at 14:41 +0100, Jan Urbański wrote:
 It does some architectural changes to PL/Python that make it easier to
 implement other features, like for instance a validator function. The
 full list of changes in the patch is:

I would review this and the following patches, but I'd really prefer it
if you could split this particular patch into about 11 single-purpose
patches.  I think most of the changes here are not interrelated.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2010-12-31 Thread Josh Berkus
On 12/31/10 4:40 AM, Robert Haas wrote:
 Someone may have proposed this before, but one way of getting standby
 naming for free would be to make the standby names the same as the
 roles used to log in, rather than adding a separate parameter.  We
 could just recommend to people that they use a separate, descriptive
 role name for each standby.  Then the synchronous_standbys parameter -
 when added - would mean a standby from one of these roles.

I like this idea; it has an elegant simplicity about it and right now I
can't think of any real faults.  It would have the added benefit that
each standby group would show up by the group name in ps and on
pg_stat_activity.

However, I agree strongly with Simon that we really want the simplest
possible synch rep implementation for 9.1, given that we're currently 15
days away from the *last* commitfest.  Nobody, at this point, has really
even test any of the sync rep patches for reliability or performance.

Here's how I could see us developing:

Simplest (9.1):
-- synch/async capability set on the standby in recovery.conf
-- synch/async transaction status (async, recv, fsync, apply) declared
as a userset by the executing session (from Simon's patch)
-- only one ack for sync, regardless of the number of standbys

This would allow users to have a single sync standby plus a pool of
async standbys, which is what I think 90% of users who care about sync
standby want.

More Complex (9.2):
-- all of the above, *plus* the ability to have standbys with ROLEs and
require acks from one server in each ROLE.

Web-Scale (9.3):
-- all of the above, plus group commit, the ability to specific a
number of servers in each ROLE who must ack.

... but the most important thing is to make the feature for 9.1 simple,
simple, simple.  We know we won't get it right on the first try.

--Josh Berkus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] estimating # of distinct values

2010-12-31 Thread Jim Nasby
On Dec 31, 2010, at 7:34 AM, Alvaro Herrera wrote:
 Excerpts from Tom Lane's message of jue dic 30 23:02:04 -0300 2010:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 I was thinking that we could have two different ANALYZE modes, one
 full and one incremental; autovacuum could be modified to use one or
 the other depending on how many changes there are (of course, the user
 could request one or the other, too; not sure what should be the default
 behavior).
 
 How is an incremental ANALYZE going to work at all?  It has no way to
 find out the recent changes in the table, for *either* inserts or
 deletes.  Unless you want to seqscan the whole table looking for tuples
 with xmin later than something-or-other ... which more or less defeats
 the purpose.
 
 Yeah, I was thinking that this incremental ANALYZE would be the stream
 in the stream-based estimator but evidently it doesn't work that way.
 The stream that needs to be passed to the estimator consists of new
 tuples as they are being inserted into the table, so this would need to
 be done by the inserter process ... or it'd need to transmit the CTIDs
 for someone else to stream them ... not an easy thing, in itself.

Perhaps listen/notify could be used for this, now that it allows passing a 
payload.

BTW, if we reduce the frequency at which full scans of large tables are needed 
then presumably the cost of the scans could be largely ignored. If we don't 
need to scan frequently then we shouldn't care very much about how long a scan 
takes, which means we could throttle it heavily. Presumably even a heavily used 
system can spare 500kB/s of IO to perform background scanning...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2010-12-31 Thread Robert Haas
On Fri, Dec 31, 2010 at 8:48 AM, Peter Eisentraut pete...@gmx.net wrote:
 On tor, 2010-12-30 at 11:03 -0500, Robert Haas wrote:
 No, quite the opposite.  With the other approach, you needed:

 constraints cannot be used on views
 constraints cannot be used on composite types
 constraints cannot be used on TOAST tables
 constraints cannot be used on indexes
 constraints cannot be used on foreign tables

 With this, you just need:

 constraints can only be used on tables

 At the beginning of this thread you said that the error messages should
 focus on what you tried to do, not what you could do instead.

Yeah, and I still believe that.  I'm having difficulty coming up with
a workable approach, though.  It would be simple enough if we could
write:

/* translator: first %s is a feature, second %s is a relation type */
%s cannot be used on %s

...but I think this is likely to cause some translation headaches.

 Also, in this particular case, the user could very well assume that a
 TOAST table or a foreign table is a table.

There's a limited amount we can do about confused users, but it is
true that the negative phrasing is better for that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers