Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Heikki Linnakangas

On 02.01.2011 00:40, Josh Berkus wrote:

On 1/1/11 5:59 AM, Stefan Kaltenbrunner wrote:

well you keep saying that but to be honest I cannot really even see a
usecase for me - what is only a random one of a set of servers is sync
at any time and I don't really know which one.
My usecases would al involved 2 sync standbys and 1 or more async ones.
but the second sync one would be in a different datacenter and I NEED to
protect against a datacenter failure which your proposals says I cannot
do :(


As far as I know, *nobody* has written the bookkeeping code to actually
track which standbys have ack'd.  We need to get single-ack synch
standby merged, tested and working before we add anything as complicated
as each standby on this list must ack.  That means that it's extremely
unlikely for 9.1 at this point.


The bookkeeping will presumably consist of an XLogRecPtr in shared 
memory for each standby, tracking how far the standby has acknowledged. 
At commit, you scan the standby slots in shared memory and check that 
the required standbys have acknowledged your commit record. The 
bookkeeping required is the same whether or not we support a list of 
standbys that must ack or just one.



Frankly, if Simon hadn't already submitted code, I'd be pushing for
single-standby-only for 9.1, instead of any one.


Yes, we are awfully late, but let's not panic.

BTW, there's a bunch of replication related stuff that we should work to 
close, that are IMHO more important than synchronous replication. Like 
making the standby follow timeline changes, to make failovers smoother, 
and the facility to stream a base-backup over the wire. I wish someone 
worked on those...



Hmm, access control... We haven't yet discussed what privileges a
standby needs to become synchronous. Perhaps it needs to be a separate
privilege that can be granted, in addition to the replication privilege?


No, I don't think so.  An additional priv would just complicate life for
DBAs without providing any real benefit.  You'd be guarding against the
very narrow hypothetical case where there's a server admin with limited
privs on the master, and authorization to create async standbies, but
not the authorization to create s synch standby.  How likely is that to
*ever* happen?


Very likely. A synchronous standby can bring the master to a halt, while 
an asynchronous one is rather harmless. If I were a DBA, and the data 
wasn't very sensitive, I would liberally hand out async privileges to my 
colleagues to set up reporting standbys, test servers etc. But I would 
*not* give them synchronous privileges, because sooner or later one 
would go hmm, I wonder what happens if I make this synchronous, or 
haphazardly copy the config file from a synchronous standby. That would 
either bring down the master, or act as a fake standby, acknowledging 
commits before they're flushed to the real synchronous standby. Either 
one would be bad.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI SLRU low-level functions first cut

2011-01-02 Thread Heikki Linnakangas

On 01.01.2011 23:21, Kevin Grittner wrote:

I've got low-level routines coded for interfacing predicate.c to SLRU
to handle old committed transactions, so that SSI can deal with
situations where a large number of transactions are run during the
lifetime of a single serializable transaction.  I'm not actually
*using* these new functions yet, but that's what I do next.  I would
love it if someone could review this commit and let me know whether
it looks generally sane.

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=00a0bc6c47c8173e82e5927d9b75fe570280860f


Nothing checking for the hi-bit flag AFAICS. I guess the code that uses 
that would do check it. But wouldn't it be simpler to mark the unused 
slots with zero commitseqno, instead of messing with the hi-bit in valid 
values?


It's probably not necessary to explicitly truncate the slru at startup. 
We don't do that for pg_subtrans, which also doesn't survive restarts. 
The next checkpoint will truncate it.


It would possibly be simpler to not reset headXid and tailXid to 
InvalidTransactionId when the window is empty, but represent that as 
tailXid == headXid + 1.


OldSerXidGetMinConflictCommitSeqNo() calls LWLockRelease twice.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Stefan Kaltenbrunner

On 01/02/2011 09:35 AM, Heikki Linnakangas wrote:

On 02.01.2011 00:40, Josh Berkus wrote:

On 1/1/11 5:59 AM, Stefan Kaltenbrunner wrote:

well you keep saying that but to be honest I cannot really even see a
usecase for me - what is only a random one of a set of servers is sync
at any time and I don't really know which one.
My usecases would al involved 2 sync standbys and 1 or more async ones.
but the second sync one would be in a different datacenter and I NEED to
protect against a datacenter failure which your proposals says I cannot
do :(


As far as I know, *nobody* has written the bookkeeping code to actually
track which standbys have ack'd. We need to get single-ack synch
standby merged, tested and working before we add anything as complicated
as each standby on this list must ack. That means that it's extremely
unlikely for 9.1 at this point.


The bookkeeping will presumably consist of an XLogRecPtr in shared
memory for each standby, tracking how far the standby has acknowledged.
At commit, you scan the standby slots in shared memory and check that
the required standbys have acknowledged your commit record. The
bookkeeping required is the same whether or not we support a list of
standbys that must ack or just one.


Frankly, if Simon hadn't already submitted code, I'd be pushing for
single-standby-only for 9.1, instead of any one.


Yes, we are awfully late, but let's not panic.

BTW, there's a bunch of replication related stuff that we should work to
close, that are IMHO more important than synchronous replication. Like
making the standby follow timeline changes, to make failovers smoother,
and the facility to stream a base-backup over the wire. I wish someone
worked on those...


yeah I agree that those two are much more of a problem for the general 
user base. Whatever people think about our current system - it is very 
easy to configure(in terms of knobs to toggle) but extremely hard to get 
set up and dealt with during failovers(and I know nobody who got it 
right the first few times or has not fucked up one thing in the process).
Syncrep is importantant but I would argue that getting those two fixed 
is even more so ;)




Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 10:35 +0200, Heikki Linnakangas wrote:

 BTW, there's a bunch of replication related stuff that we should work
 to close, that are IMHO more important than synchronous replication.
 Like making the standby follow timeline changes, to make failovers
 smoother, and the facility to stream a base-backup over the wire. I
 wish someone worked on those... 

Hopefully, you'll be allowed to work on those, if they are more
important?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Basic foreign table support.

2011-01-02 Thread Magnus Hagander
Typo, I think:

-   (errmsg(skipping \%s\ --- cannot vacuum indexes,
views, or special system tables,
+   (errmsg(skipping \%s\ --- cannot only non-tables or
special system tables,

//Magnus


On Sun, Jan 2, 2011 at 05:48, Robert Haas rh...@postgresql.org wrote:
 Basic foreign table support.

 Foreign tables are a core component of SQL/MED.  This commit does
 not provide a working SQL/MED infrastructure, because foreign tables
 cannot yet be queried.  Support for foreign table scans will need to
 be added in a future patch.  However, this patch creates the necessary
 system catalog structure, syntax support, and support for ancillary
 operations such as COMMENT and SECURITY LABEL.

 Shigeru Hanada, heavily revised by Robert Haas

 Branch
 --
 master

 Details
 ---
 http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=0d692a0dc9f0e532c67c577187fe5d7d323cb95b

 Modified Files
 --


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] management of large patches

2011-01-02 Thread Magnus Hagander
On Sun, Jan 2, 2011 at 06:32, Robert Haas robertmh...@gmail.com wrote:
 We're coming the end of the 9.1 development cycle, and I think that
 there is a serious danger of insufficient bandwidth to handle the
 large patches we have outstanding.  For my part, I am hoping to find
 the bandwidth to two, MAYBE three major commits between now and the
 end of 9.1CF4, but I am not positive that I will be able to find even
 that much time, and the number of major patches vying for attention is
 considerably greater than that.  Quick estimate:

 - SQL/MED - probably needs ~3 large commits: foreign table scan, file
 FDW, postgresql FDW, plus whatever else gets submitted in the next two
 weeks
 - MERGE
 - checkpoint improvements
 - SE-Linux integration
 - extensions - may need 2 or more commits
 - true serializability - not entirely sure of the status of this
 - writeable CTEs (Tom has indicated he will look at this)
 - PL/python patches (Peter has indicated he will look look at this)
 - snapshot taking inconsistencies (Tom has indicated he will look at this)
 - per-column collation (Peter)
 - synchronous replication (Simon, and, given the level of interest in
 and complexity of this feature, probably others as well)

 I guess my basic question is - is it realistic to think that we're
 going to get all of the above done in the next 45 days?  Is there
 anything we can do make the process more efficient?  If a few more
 large patches drop into the queue in the next two weeks, will we have
 bandwidth for those as well?  If we don't think we can get everything
 done in the time available, what's the best way to handle that?  I

Well, we've always (well, since we had cf's) said that large patches
shouldn't be submitted for the last CF, they should be submitted for
one of the first. So if something *new* gets dumped on us for the last
one, giving priority to the existing ones in the queue seems like the
only fair option.

As for priority between those that *were* submitted earlier, and have
been reworked (which is how the system is supposed to work), it's a
lot harder. And TBH, I think we're going to have a problem getting all
those done. But the question is - are all ready enough, or are a
couple going to need the returned with feedback status *regardless*
of if this is the last CF or not?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

2011-01-02 Thread Magnus Hagander
On Wed, Dec 29, 2010 at 19:49, Robert Haas robertmh...@gmail.com wrote:
 On Dec 29, 2010, at 10:14 AM, Magnus Hagander mag...@hagander.net wrote:
 We can be held responsible for the packaging decisions if they use
 *our* make install commands, imho.

 Yep.

So, as I see it there are two ways of doing it - install a
catversion.h file and include it from libpq-fe.h, or modify the
libpq-fe.h. I still think modifying libpq-fe.h is the better of these
choices - but either of them would work. But is the catversion value
really the best interface for the user? This is about libpq
functionality level, which really has nothing to do with the backend
catalog, does it?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sat, 2011-01-01 at 22:11 -0500, Aidan Van Dyk wrote:
 On Sat, Jan 1, 2011 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Sat, 2011-01-01 at 14:40 -0800, Josh Berkus wrote:
 
  Standby in general deals with the A,D,R triangle (Availability,
  Durability, Response time).  Any one configuration is the A,R
  configuration, and the only reason to go out with it for 9.1 is
  because it's simpler to implement than the D,R configuration (all
  standbys must ack).
 
  Nicely put. Not the only reason though...
 
  As I showed earlier, the AR gives you 99.999% availability and the DR
  gives you 94% availability, considering a 3 server config. If you add
  more servers, the availability of the DR option gets much worse, very
  quickly.
 
  The performance of AR is much better also, and stays same or better as
  cluster size increases. DR choice makes performance degrade as cluster
  size increases, since it works at the speed of the slowest node.
 
 I'm all for getting first-past-post in for 9.1.  Otherwise I fear
 we'll get nothing.
 
 Stephen and I will only be able to use 1 sync slave, the DR-site
 one.  

No, the AR and DR options are identical with just one sync standby.

You've been requesting the DR option with 2 standbys, which is what
gives you 94% availability.

 That's fine.  I can live with it, and make my local slave be
 async.  Or replicate the FS/block under WAL.  I can monitor the 
 out of it, and unless it goes down, it should easily be able to keep
 up with the remote sync one beind a slower WAN link.
 
 And I think both Stephen and I understand your availability math.
 We're not arguing that the 1st past post both gives better query
 availabiliyt, and cluster scale performance.
 
 But when the primary datacenter servers are dust in the crater (or
 boats in the flood, or ash in the fire), I either keep my job, or I
 don't.  And that depends on whether there is a chance I (my database
 system) confirmed a transaction that I can't recover.

I'm not impressed. You neglect to mention that Oracle and MySQL would
put you in exactly the same position.

You also neglect to say that if the local standby goes down, you were
advocating a design that would take the whole application down. If you
actually did what you have been suggesting, and the cluster went down as
it inevitably would do, once your colleagues realise that you knowingly
configured the cluster to have only 94% availability, you won't have a
job anymore, you'll be escorted off the premises while shouting but
while it was down, it lost no data. When that never happens, thank me.

There are people that need more durability than availability, but not
many. If the database handles high value transactions, they very
probably want it to keep on processing high value transactions.

You'll have the choice of how to configure it, because of me listening
to other people's views and selecting only the ideas that make sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 10:35 +0200, Heikki Linnakangas wrote:
 
  Frankly, if Simon hadn't already submitted code, I'd be pushing for
  single-standby-only for 9.1, instead of any one.
 
 Yes, we are awfully late, but let's not panic.

Yes, we're about a year late. Getting a simple feature like this into
the code could have been done in 9.0.

We must stop returning to overcomplex features, especially if they
aren't backed up with solid analysis of things like server availability
and query visibility. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Support for negative index values in array fetching

2011-01-02 Thread Valtonen, Hannu

Hi,

I ran into the problem of getting the last n elements out of an array 
and while some workarounds do exist:
(http://stackoverflow.com/questions/2949881/getting-the-last-element-of-a-postgres-array-declaratively) 
I was still annoyed that I couldn't just ask for the last n values in an 
array Python/Perl style.


Here's a patch to add support for negative index values in fetching 
elements from an array.


i.e.

postgres=# CREATE TABLE blah (a int[]);
CREATE TABLE
Time: 11.357 ms
postgres=# INSERT INTO blah (a) VALUES (ARRAY[1,2,3,4,5,6,7,8,9,10]);
INSERT 0 1
Time: 1.282 ms
postgres=# SELECT a[-1] FROM blah;
 a

 10
(1 row)

Time: 0.450 ms
postgres=# SELECT a[-5:10] FROM blah;
  a
--
 {6,7,8,9,10}
(1 row)

Time: 0.949 ms

While testing this I BTW ran into funny behaviour in setting array 
slices, as in:


postgres=# update blah set a[-5] = 12;
UPDATE 1
Time: 1.500 ms
postgres=# select * from blah;
 a

 [-5:10]={12,NULL,NULL,NULL,NULL,NULL,1,2,3,4,5,6,7,8,9,10}
(1 row)

Time: 0.431 ms

And since this negative array expansion behaviour totally surprised me, 
I haven't changed that in this patch at all.


--
Hannu Valtonen


diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
index bb4657e..dc7b6f4 100644
--- a/doc/src/sgml/array.sgml
+++ b/doc/src/sgml/array.sgml
@@ -224,7 +224,9 @@ SELECT name FROM sal_emp WHERE pay_by_quarter[1] lt;gt; 
pay_by_quarter[2];
   By default productnamePostgreSQL/productname uses a
   one-based numbering convention for arrays, that is,
   an array of replaceablen/ elements starts with 
literalarray[1]/literal and
-  ends with literalarray[replaceablen/]/literal.
+  ends with literalarray[replaceablen/]/literal. Negative
+  array subscript numbers indicate that the position of the element is 
calculated from
+  the end of the array, with -1 indicating the last element in the array.
  /para
 
  para
@@ -242,6 +244,34 @@ SELECT pay_by_quarter[3] FROM sal_emp;
  /para
 
  para
+  This query retrieves the last quarter pay of all employees:
+
+programlisting
+SELECT pay_by_quarter[-1] FROM sal_emp;
+
+ pay_by_quarter
+
+  1
+  25000
+(2 rows)
+/programlisting
+/para
+
+para
+  This query retrieves the pay of all employees for the last three quarters:
+programlisting
+SELECT pay_by_quarter[-3:4] FROM sal_emp;
+
+   pay_by_quarter
+-
+ {1,1,1}
+ {25000,25000,25000}
+(2 rows)
+
+/programlisting
+ /para
+
+ para
   We can also access arbitrary rectangular slices of an array, or
   subarrays.  An array slice is denoted by writing
   
literalreplaceablelower-bound/replaceable:replaceableupper-bound/replaceable/literal
diff --git a/src/backend/utils/adt/arrayfuncs.c 
b/src/backend/utils/adt/arrayfuncs.c
index fb4cbce..9d6c3f1 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1786,6 +1786,8 @@ array_ref(ArrayType *array,
}
for (i = 0; i  ndim; i++)
{
+   if (indx[i]  0) /* A negative index number indicates a 
position calculated from the end of the array */
+   indx[i] = dim[i] + indx[i] + lb[i];
if (indx[i]  lb[i] || indx[i] = (dim[i] + lb[i]))
{
*isNull = true;
@@ -1914,6 +1916,10 @@ array_get_slice(ArrayType *array,
 
for (i = 0; i  nSubscripts; i++)
{
+   if (lowerIndx[i]  0) /* A negative index number indicates a 
position calculated from the end of the array */
+   lowerIndx[i] = dim[i] + lowerIndx[i] + lb[i];
+   if (upperIndx[i]  0) /* A negative index number indicates a 
position calculated from the end of the array */
+   upperIndx[i] = dim[i] + upperIndx[i] + lb[i];
if (lowerIndx[i]  lb[i])
lowerIndx[i] = lb[i];
if (upperIndx[i] = (dim[i] + lb[i]))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade, patch v0: debug help needed

2011-01-02 Thread Dimitri Fontaine
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 The problem occurs on ALTER OPERATOR FAMILY ... SET EXTENSION, that's
 what dichotomy on the citext.upgrade.sql tells me.

The code in question was copy/pasted from the SET SCHEMA code path in
gram.y then other related files.  So I just tested a clean HEAD checkout
then the following steps:

  make -C contrib/citext install
  psql -f .../head/share/contrib/citext.sql
  psql
  dim=# do $$ begin execute 'alter operator class public.citext_ops using btree 
set schema utils'; end; $$;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Will try to debug that as soon as possible, but spare time here tends to
be sparse so I preferred sending this mail first.  Preliminary
investigation leads me thinking the cause is using n-objarg rather than
n-addname to host the access_method.  Working on a fix.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Hannu Krosing

On 2.1.2011 5:36, Robert Haas wrote:

On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com  wrote:

Yes, working out the math is a good idea. Things are much clearer if we
do that.

Let's assume we have 98% availability on any single server.

1. Having one primary and 2 standbys, either of which can acknowledge,
and we never lock up if both standbys fail, then we will have 99.9992%
server availability. (So PostgreSQL hits 5 Nines, with data
guarantees). (Maximised availability)

I don't agree with this math.  If the master and one standby fail
simultaneously, the other standby is useless, because it may or may
not be caught up with the master.  You know that the last transaction
acknowledged as committed by the master is on at least one of the two
standbys, but you don't know which one, and so you can't safely
promote the surviving standby.
(If you are working in an environment where promoting the surviving
standby when it's possibly not caught up is OK, then you don't need
sync rep in the first place: you can just run async rep and get much
better performance.)
So the availability is 98% (you are up when the master is up) + 98%^2
* 2% (you are up when both slaves are up and the master is down) =
99.92%.  If you had only a single standby, then you could be certain
that any commit acknowledged by the master was on that standby.  Thus
your availability would be 98% (up when master is up) + 98% * 2% (you
are up when the master is down and the slave is up) = 99.96%.

OTOH, in the case where you need _all_ the slaves to confirm any failing 
slave brings

the master down, so adding a slave brings down availability by extra 2%

The solution to achieving good durability AND availability is requiring 
N past the

post instead of 1 past the post.

In this case you can get to 99.9992% availability with master + 3 sync 
slaves, 2 of which have ACK.


---
Hannu Krosing
Performance and Infinite Scalability Consultant
http://www.2ndQuadrant.com/books/




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension upgrade, patch v0: debug help needed

2011-01-02 Thread Dimitri Fontaine
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
   make -C contrib/citext install
   psql -f .../head/share/contrib/citext.sql
   psql
   dim=# do $$ begin execute 'alter operator class public.citext_ops using 
 btree set schema utils'; end; $$;
 server closed the connection unexpectedly
   This probably means the server terminated abnormally
   before or while processing the request.
 The connection to the server was lost. Attempting reset: Failed.

The fix was ok, but I had to test with the right environment to be able
to appreciate that :)

Please find it attached.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/src/backend/commands/alter.c
--- b/src/backend/commands/alter.c
***
*** 198,208  ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt)
  			break;
  
  		case OBJECT_OPCLASS:
! 			AlterOpClassNamespace(stmt-object, stmt-objarg, stmt-newschema);
  			break;
  
  		case OBJECT_OPFAMILY:
! 			AlterOpFamilyNamespace(stmt-object, stmt-objarg, stmt-newschema);
  			break;
  
  		case OBJECT_SEQUENCE:
--- 198,208 
  			break;
  
  		case OBJECT_OPCLASS:
! 			AlterOpClassNamespace(stmt-object, stmt-addname, stmt-newschema);
  			break;
  
  		case OBJECT_OPFAMILY:
! 			AlterOpFamilyNamespace(stmt-object, stmt-addname, stmt-newschema);
  			break;
  
  		case OBJECT_SEQUENCE:
*** a/src/backend/commands/opclasscmds.c
--- b/src/backend/commands/opclasscmds.c
***
*** 1993,2008  AlterOpClassOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId)
   * ALTER OPERATOR CLASS any_name USING access_method SET SCHEMA name
   */
  void
! AlterOpClassNamespace(List *name, List *argam, const char *newschema)
  {
  	Oid			amOid;
- 	char   *access_method = linitial(argam);
  	Relation	rel;
  	Oid			oid;
  	Oid			nspOid;
  
- 	Assert(list_length(argam) == 1);
- 
  	amOid = get_am_oid(access_method, false);
  
  	rel = heap_open(OperatorClassRelationId, RowExclusiveLock);
--- 1993,2005 
   * ALTER OPERATOR CLASS any_name USING access_method SET SCHEMA name
   */
  void
! AlterOpClassNamespace(List *name, char *access_method, const char *newschema)
  {
  	Oid			amOid;
  	Relation	rel;
  	Oid			oid;
  	Oid			nspOid;
  
  	amOid = get_am_oid(access_method, false);
  
  	rel = heap_open(OperatorClassRelationId, RowExclusiveLock);
***
*** 2185,2199  get_am_oid(const char *amname, bool missing_ok)
   * ALTER OPERATOR FAMILY any_name USING access_method SET SCHEMA name
   */
  void
! AlterOpFamilyNamespace(List *name, List *argam, const char *newschema)
  {
  	Oid			amOid;
- 	char   *access_method = linitial(argam);
  	Relation	rel;
  	Oid			nspOid;
  	Oid			oid;
  
- 	Assert(list_length(argam) == 1);
  	amOid = get_am_oid(access_method, false);
  
  	rel = heap_open(OperatorFamilyRelationId, RowExclusiveLock);
--- 2182,2194 
   * ALTER OPERATOR FAMILY any_name USING access_method SET SCHEMA name
   */
  void
! AlterOpFamilyNamespace(List *name, char *access_method, const char *newschema)
  {
  	Oid			amOid;
  	Relation	rel;
  	Oid			nspOid;
  	Oid			oid;
  
  	amOid = get_am_oid(access_method, false);
  
  	rel = heap_open(OperatorFamilyRelationId, RowExclusiveLock);
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 6225,6231  AlterObjectSchemaStmt:
  	AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
  	n-objectType = OBJECT_OPCLASS;
  	n-object = $4;
! 	n-objarg = list_make1($6);
  	n-newschema = $9;
  	$$ = (Node *)n;
  }
--- 6225,6231 
  	AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
  	n-objectType = OBJECT_OPCLASS;
  	n-object = $4;
! 	n-addname = $6;
  	n-newschema = $9;
  	$$ = (Node *)n;
  }
***
*** 6234,6240  AlterObjectSchemaStmt:
  	AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
  	n-objectType = OBJECT_OPFAMILY;
  	n-object = $4;
! 	n-objarg = list_make1($6);
  	n-newschema = $9;
  	$$ = (Node *)n;
  }
--- 6234,6240 
  	AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt);
  	n-objectType = OBJECT_OPFAMILY;
  	n-object = $4;
! 	n-addname = $6;
  	n-newschema = $9;
  	$$ = (Node *)n;
  }
*** a/src/include/commands/defrem.h
--- b/src/include/commands/defrem.h
***
*** 101,111  extern void RenameOpClass(List *name, const char *access_method, const char *new
  extern void RenameOpFamily(List *name, const char *access_method, const char *newname);
  extern void AlterOpClassOwner(List *name, const char *access_method, Oid newOwnerId);
  extern void AlterOpClassOwner_oid(Oid opclassOid, Oid newOwnerId);
! extern void AlterOpClassNamespace(List *name, List *argam, const char *newschema);
  extern void AlterOpFamilyOwner(List *name, const char *access_method, Oid newOwnerId);
  extern void AlterOpFamilyOwner_oid(Oid opfamilyOid, Oid newOwnerId);
  extern 

Re: [HACKERS] Support for negative index values in array fetching

2011-01-02 Thread Florian Pflug
On Jan2, 2011, at 11:45 , Valtonen, Hannu wrote:
 I ran into the problem of getting the last n elements out of an array and 
 while some workarounds do exist:
 (http://stackoverflow.com/questions/2949881/getting-the-last-element-of-a-postgres-array-declaratively)
  I was still annoyed that I couldn't just ask for the last n values in an 
 array Python/Perl style.
 
 Here's a patch to add support for negative index values in fetching elements 
 from an array.

That won't work. In SQL, array indices don't necessarily start with 0 (or 1, or 
*any*
single value). Instead, you can each dimension's lower and upper bound for 
index values
with array_lower() and array_upper().

Here's an example

fgp= do $$
  declare a text[];
  begin
a[-1] := 'foo';
a[0] := 'bar';
raise notice 'a[-1] == %', a[-1];
  end
$$ language 'plpgsql' ;

This will raise the notice 'a[-1] == foo'!

The only way around that would be to introduce magic constants lower, upper 
that
can be used within index expressions and evaluate to the indexed dimension's 
lower
and upper bound. You'd then use

  my_array[upper], my_array[upper-1], ...

to refer to the last, second-to-last, ... element in the array. Actually doing 
this
could get pretty messy, though - not sure if it's really worth the effort...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Basic foreign table support.

2011-01-02 Thread Robert Haas
On Sun, Jan 2, 2011 at 4:24 AM, Magnus Hagander mag...@hagander.net wrote:
 Typo, I think:

 -               (errmsg(skipping \%s\ --- cannot vacuum indexes,
 views, or special system tables,
 +               (errmsg(skipping \%s\ --- cannot only non-tables or
 special system tables,

Oops, fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

2011-01-02 Thread Robert Haas
On Sun, Jan 2, 2011 at 4:36 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Dec 29, 2010 at 19:49, Robert Haas robertmh...@gmail.com wrote:
 On Dec 29, 2010, at 10:14 AM, Magnus Hagander mag...@hagander.net wrote:
 We can be held responsible for the packaging decisions if they use
 *our* make install commands, imho.

 Yep.

 So, as I see it there are two ways of doing it - install a
 catversion.h file and include it from libpq-fe.h, or modify the
 libpq-fe.h. I still think modifying libpq-fe.h is the better of these
 choices - but either of them would work. But is the catversion value
 really the best interface for the user? This is about libpq
 functionality level, which really has nothing to do with the backend
 catalog, does it?

It doesn't seem to me that a change of this type requires a catversion bump.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] management of large patches

2011-01-02 Thread Robert Haas
On Sun, Jan 2, 2011 at 4:29 AM, Magnus Hagander mag...@hagander.net wrote:
 As for priority between those that *were* submitted earlier, and have
 been reworked (which is how the system is supposed to work), it's a
 lot harder. And TBH, I think we're going to have a problem getting all
 those done. But the question is - are all ready enough, or are a
 couple going to need the returned with feedback status *regardless*
 of if this is the last CF or not?

Well, that all depends on how much work people are willing to put into
reviewing and committing them, which I think is what we need to
determine.  None of those patches are going to be as simple as patch
-p1  $F  git commit -a  git push.  Having done a couple of these
now, I'd say that doing final review and commit of a patch of this
scope takes me ~20 hours of work, but it obviously varies a lot based
on how good the patch is to begin with and how much review has already
been done.  So I guess the question is - who is willing to step up to
the plate, either as reviewer or as final reviewer/committer?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Base Backup Streaming (was: [HACKERS] Sync Rep Design)

2011-01-02 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 BTW, there's a bunch of replication related stuff that we should work to
 close, that are IMHO more important than synchronous replication. Like
 making the standby follow timeline changes, to make failovers smoother, and
 the facility to stream a base-backup over the wire. I wish someone worked on
 those...

So, we've been talking about base backup streaming at conferences and we
have a working prototype.  We even have a needed piece of it in core
now, that's the pg_read_binary_file() function.  What we still miss is
an overall design and some integration effort.  Let's design first.

I propose the following new pg_ctl command to initiate the cloning:

 pg_ctl clone [-D datadir] [-s on|off] [-t filename]  primary_conninfo

As far as user are concerned, that would be the only novelty.  Once that
command is finished (successfully) they would edit postgresql.conf and
start the service as usual.  A basic recovery.conf file is created with
the given options, standby_mode is driven by -s and defaults to off, and
trigger_file defaults to being omitted and is given by -t.  Of course
the primary_conninfo given on the command line is what ends up into the
recovery.conf file.

That alone would allow for making base backups for recovery purposes and
for standby preparing.

To support for this new tool, the simplest would be to just copy what
I've been doing in the prototype, that is run a query to get the primary
file listing (per tablespace, not done in the prototype) then get their
bytea content over the wire.  That means there's no further backend
support code to write.

  https://github.com/dimitri/pg_basebackup

We could prefer to have a backend function prepare a tar archive and
stream it using the COPY protocol, with some compression support, but
that's more complex to code now and to parallelize down the road.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] management of large patches

2011-01-02 Thread KaiGai Kohei

(2011/01/02 14:32), Robert Haas wrote:

We're coming the end of the 9.1 development cycle, and I think that
there is a serious danger of insufficient bandwidth to handle the
large patches we have outstanding.  For my part, I am hoping to find
the bandwidth to two, MAYBE three major commits between now and the
end of 9.1CF4, but I am not positive that I will be able to find even
that much time, and the number of major patches vying for attention is
considerably greater than that.  Quick estimate:


  :

- SE-Linux integration


How about feasibility to commit this 3KL patch in the last 45 days?

At least, the idea of security provider enables us to maintain a set
of hooks and logic to make access control decision independently.
I'm available to provide a set of sources for this module at
git.postgresql.org, so we can always obtain a working module from here.
The worst scenario for us is nothing were progressed in spite of
large man-power to review and discuss.

It may be more productive to keep features to be committed on the
last CF as small as possible, such as hooks to support a part of DDL
permissions or pg_regress enhancement to run regression test.

Thanks,
--
KaiGai Kohei kai...@kaigai.gr.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #5662: Incomplete view

2011-01-02 Thread Peter Eisentraut
On mån, 2010-12-06 at 14:47 +0200, Peter Eisentraut wrote:
 On sön, 2010-09-19 at 14:28 -0400, Tom Lane wrote:
  Or maybe we could implement that function, call it like this
  
 CAST((pg_sequence_parameters(c.oid)).max_value AS
  cardinal_number) AS maximum_value,
  
  and plan on optimizing the view when we get LATERAL.
 
 Here is an implementation of that.

Committed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote:
 On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Yes, working out the math is a good idea. Things are much clearer if we
  do that.
 
  Let's assume we have 98% availability on any single server.
 
  1. Having one primary and 2 standbys, either of which can acknowledge,
  and we never lock up if both standbys fail, then we will have 99.9992%
  server availability. (So PostgreSQL hits 5 Nines, with data
  guarantees). (Maximised availability)
 
 I don't agree with this math. ...(snip by Simon)... 99.96%.

OK, so that is at least 99.96%. Cool.

The key point here is not (1), but option (4).

The approach advocated by Heikki and yourself gives us 94% availability.
IMHO that is ridiculous, and I will not accept that as the *only* way
forwards, for that reason, whoever advocates it or for how long they
keep arguing. I do accept that some wish that as an option.

If we are to have a sensible technical debate with an eventual end, you
must answer the points placed in front of you, not just sidestep and try
to point out problems somewhere else. All analysis must be applied to
all options, not just those options advocated by someone else. I've been
asking for a failure mode analysis for months and it never comes in
full.

I'm more than happy to discuss your additional points once we are clear
on the 94% because it is pivotal to everything I've been proposing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSI SLRU low-level functions first cut

2011-01-02 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 
 Nothing checking for the hi-bit flag AFAICS. I guess the code that
 uses that would do check it.
 
Right.  After getting this layer done, I went off to watch the
Badgers in the Rose Bowl, leaving that coding for today.  ;-)
 
 But wouldn't it be simpler to mark the unused slots with zero
 commitseqno, instead of messing with the hi-bit in valid values?
 
This is the earliest commitSeqNo of rw-conflicts out which the
transaction we're looking up had.  I'm using zero to mean that there
was no conflict.  Perhaps instead of setting the high bit I could
just use a special value (like all bits set) instead of zero to mean
no conflict.  In any event, it's clear that all zero should mean
not found and I need some other way to indicate no conflict.
 
 It's probably not necessary to explicitly truncate the slru at
 startup. We don't do that for pg_subtrans, which also doesn't
 survive restarts. The next checkpoint will truncate it.
 
Good point.  That slims things down by 22 lines and eliminates a
distracting special case.
 
 It would possibly be simpler to not reset headXid and tailXid to
 InvalidTransactionId when the window is empty, but represent that
 as tailXid == headXid + 1.
 
I'll take a look.  I went 'round a few time on how best to handle the
empty window, which was complicated a little bit by wanting to keep
track of the tail even when the window was currently empty.  Because
xids won't be submitted in strictly sequential order, I might need
to go back a ways in the sequence to update something, so I need to
keep track of existing segment files even when there are currently no
xids to track; and I wanted the searches to have a fast path out for
such cases.
 
 OldSerXidGetMinConflictCommitSeqNo() calls LWLockRelease twice.
 
That's because the function calls SimpleLruReadPage_ReadOnly:

Control lock must NOT be held at entry, but will be held at exit.
 
That strikes me as an odd API, but it is what it is.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-02 Thread Tatsuo Ishii
  Seems reasonable. Does the victim backend currently know why it has been
  killed?
 
  I don't think so.
 
  One idea is postmaster sets a flag in the shared memory area
  indicating it rceived SIGTERM before forwarding the signal to
  backends.
 
  Backend check the flag and if it's not set, it knows that the signal
  has been sent by pg_terminate_backend(), not postmaster.
 
 Or it could also be sent by some other user process, like the user
 running kill from the shell.
 
 No problem (at least for pgpool-II).
 
 If the flag is not set, postgres returns the same code as the one
 killed by pg_terminate_backend(). The point is, backend is killed by
 postmaster or not. Because if backend was killed by postmaster,
 pgpool-II should not expect the PostgreSQL server is usable since
 postmaster decided to shutdown.

Here is the patch to implement the feature.

1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to
   the target backend.
2) The infrastructure used for message passing is
   storage/ipc/procsignal.c The new message type for ProcSignalReason
   is PROCSIG_TERMNINATE_BACKEND_INTERRUPT
 3) I assign new error code 57P04 which is returned from the backend
killed by pg_terminate_backend().

#define ERRCODE_TERMINATE_BACKEND   MAKE_SQLSTATE('5','7', 
'P','0','4')

Comments are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***
*** 279,284  procsignal_sigusr1_handler(SIGNAL_ARGS)
--- 279,287 
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  
+ 	if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT))
+ 		HandleTerminateBackendInterrupt();
+ 
  	latch_sigusr1_handler();
  
  	errno = save_errno;
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***
*** 184,189  static bool RecoveryConflictPending = false;
--- 184,195 
  static bool RecoveryConflictRetryable = true;
  static ProcSignalReason RecoveryConflictReason;
  
+ /*
+  * True if backend is being killed by pg_terminate_backend().
+  * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1.
+  */
+ static bool TerminateBackendRequest = false;
+ 
  /* 
   *		decls for routines only used in this file
   * 
***
*** 2875,2880  RecoveryConflictInterrupt(ProcSignalReason reason)
--- 2881,2924 
  }
  
  /*
+  * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend
+  * handling following receipt of SIGUSR1. Designed to be similar to die().
+  * Called only by a normal user backend.
+  */
+ void
+ HandleTerminateBackendInterrupt(void)
+ {
+ 	int			save_errno = errno;
+ 
+ 	/* Don't joggle the elbow of proc_exit */
+ 	if (!proc_exit_inprogress)
+ 	{
+ 		InterruptPending = true;
+ 		ProcDiePending = true;
+ 		TerminateBackendRequest = true;
+ 
+ 		/*
+ 		 * If it's safe to interrupt, and we're waiting for input or a lock,
+ 		 * service the interrupt immediately
+ 		 */
+ 		if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
+ 			CritSectionCount == 0)
+ 		{
+ 			/* bump holdoff count to make ProcessInterrupts() a no-op */
+ 			/* until we are done getting ready for it */
+ 			InterruptHoldoffCount++;
+ 			LockWaitCancel();	/* prevent CheckDeadLock from running */
+ 			DisableNotifyInterrupt();
+ 			DisableCatchupInterrupt();
+ 			InterruptHoldoffCount--;
+ 			ProcessInterrupts();
+ 		}
+ 	}
+ 
+ 	errno = save_errno;
+ }
+ 
+ /*
   * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro
   *
   * If an interrupt condition is pending, and it's safe to service it,
***
*** 2912,2917  ProcessInterrupts(void)
--- 2956,2966 
  	(errcode(ERRCODE_ADMIN_SHUTDOWN),
  			  errmsg(terminating connection due to conflict with recovery),
  	 errdetail_recovery_conflict()));
+ 		else if (TerminateBackendRequest)
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_TERMINATE_BACKEND),
+ 	 errmsg(terminating connection due to pg_terminate_backend)));
+ 
  		else
  			ereport(FATAL,
  	(errcode(ERRCODE_ADMIN_SHUTDOWN),
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 114,120  pg_cancel_backend(PG_FUNCTION_ARGS)
  Datum
  pg_terminate_backend(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM));
  }
  
  Datum
--- 114,122 
  Datum
  pg_terminate_backend(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_BOOL(
! 		SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT,
! 	   InvalidBackendId) == 0);
  }
  
  Datum
*** a/src/include/storage/procsignal.h
--- b/src/include/storage/procsignal.h
***
*** 40,45  typedef 

Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-02 Thread Tatsuo Ishii
  Seems reasonable. Does the victim backend currently know why it has been
  killed?
 
  I don't think so.
 
  One idea is postmaster sets a flag in the shared memory area
  indicating it rceived SIGTERM before forwarding the signal to
  backends.
 
  Backend check the flag and if it's not set, it knows that the signal
  has been sent by pg_terminate_backend(), not postmaster.
 
 Or it could also be sent by some other user process, like the user
 running kill from the shell.
 
 No problem (at least for pgpool-II).
 
 If the flag is not set, postgres returns the same code as the one
 killed by pg_terminate_backend(). The point is, backend is killed by
 postmaster or not. Because if backend was killed by postmaster,
 pgpool-II should not expect the PostgreSQL server is usable since
 postmaster decided to shutdown.

Here is the patch to implement the feature.

1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to
   the target backend.
2) The infrastructure used for message passing is
   storage/ipc/procsignal.c The new message type for ProcSignalReason
   is PROCSIG_TERMNINATE_BACKEND_INTERRUPT
3) I assign new error code 57P04 which is returned from the backend
   killed by pg_terminate_backend().

#define ERRCODE_TERMINATE_BACKEND   MAKE_SQLSTATE('5','7', 
'P','0','4')

Comments are welcome.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
*** a/src/backend/storage/ipc/procsignal.c
--- b/src/backend/storage/ipc/procsignal.c
***
*** 279,284  procsignal_sigusr1_handler(SIGNAL_ARGS)
--- 279,287 
  	if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
  		RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
  
+ 	if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT))
+ 		HandleTerminateBackendInterrupt();
+ 
  	latch_sigusr1_handler();
  
  	errno = save_errno;
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***
*** 184,189  static bool RecoveryConflictPending = false;
--- 184,195 
  static bool RecoveryConflictRetryable = true;
  static ProcSignalReason RecoveryConflictReason;
  
+ /*
+  * True if backend is being killed by pg_terminate_backend().
+  * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1.
+  */
+ static bool TerminateBackendRequest = false;
+ 
  /* 
   *		decls for routines only used in this file
   * 
***
*** 2875,2880  RecoveryConflictInterrupt(ProcSignalReason reason)
--- 2881,2924 
  }
  
  /*
+  * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend
+  * handling following receipt of SIGUSR1. Designed to be similar to die().
+  * Called only by a normal user backend.
+  */
+ void
+ HandleTerminateBackendInterrupt(void)
+ {
+ 	int			save_errno = errno;
+ 
+ 	/* Don't joggle the elbow of proc_exit */
+ 	if (!proc_exit_inprogress)
+ 	{
+ 		InterruptPending = true;
+ 		ProcDiePending = true;
+ 		TerminateBackendRequest = true;
+ 
+ 		/*
+ 		 * If it's safe to interrupt, and we're waiting for input or a lock,
+ 		 * service the interrupt immediately
+ 		 */
+ 		if (ImmediateInterruptOK  InterruptHoldoffCount == 0 
+ 			CritSectionCount == 0)
+ 		{
+ 			/* bump holdoff count to make ProcessInterrupts() a no-op */
+ 			/* until we are done getting ready for it */
+ 			InterruptHoldoffCount++;
+ 			LockWaitCancel();	/* prevent CheckDeadLock from running */
+ 			DisableNotifyInterrupt();
+ 			DisableCatchupInterrupt();
+ 			InterruptHoldoffCount--;
+ 			ProcessInterrupts();
+ 		}
+ 	}
+ 
+ 	errno = save_errno;
+ }
+ 
+ /*
   * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro
   *
   * If an interrupt condition is pending, and it's safe to service it,
***
*** 2912,2917  ProcessInterrupts(void)
--- 2956,2966 
  	(errcode(ERRCODE_ADMIN_SHUTDOWN),
  			  errmsg(terminating connection due to conflict with recovery),
  	 errdetail_recovery_conflict()));
+ 		else if (TerminateBackendRequest)
+ 			ereport(FATAL,
+ 	(errcode(ERRCODE_TERMINATE_BACKEND),
+ 	 errmsg(terminating connection due to pg_terminate_backend)));
+ 
  		else
  			ereport(FATAL,
  	(errcode(ERRCODE_ADMIN_SHUTDOWN),
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 114,120  pg_cancel_backend(PG_FUNCTION_ARGS)
  Datum
  pg_terminate_backend(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM));
  }
  
  Datum
--- 114,122 
  Datum
  pg_terminate_backend(PG_FUNCTION_ARGS)
  {
! 	PG_RETURN_BOOL(
! 		SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT,
! 	   InvalidBackendId) == 0);
  }
  
  Datum
*** a/src/include/storage/procsignal.h
--- b/src/include/storage/procsignal.h
***
*** 40,45  typedef enum
--- 

Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility

2011-01-02 Thread Peter Eisentraut
On tis, 2010-12-28 at 13:13 +0100, Magnus Hagander wrote:
 My pg_streamrecv no longer works with 9.1, because it returns
 PGRES_COPY_BOTH instead of PGRES_COPY_OUT when initating a copy.
 That's fine.
 
 So I'd like to make it work on both. Specifically, I would like it to
 check for PGRES_COPY_BOTH if the server is 9.1 and PGRES_COPY_OUT if
 it's 9.0. Which can be done by checking the server version.

ISTM that the correct fix is to increment to protocol version number to
3.1 and send PGRES_COPY_OUT if the client requests version 3.0.  That's
what the version numbers are for, no?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Kevin Grittner
Simon Riggs  wrote:
 On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote:
 On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs
 wrote:
 Yes, working out the math is a good idea. Things are much clearer
 if we do that.

 Let's assume we have 98% availability on any single server.

 1. Having one primary and 2 standbys, either of which can
 acknowledge, and we never lock up if both standbys fail, then we
 will have 99.9992% server availability. (So PostgreSQL hits 5
 Nines, with data guarantees). (Maximised availability)

 I don't agree with this math. ...(snip by Simon)... 99.96%.
 
 OK, so that is at least 99.96%. Cool.
 
I think you're talking about different metrics, and you're both
right.  With two servers configured in sync rep your chance of having
an available (running) server is 99.9992%.  The chance that you know
that you have one that is totally up to date, with no lost
transactions is 99.9208%.  The chance that you *actually* have
up-to-date data would be higher, but you'd have no way to be sure. 
The 99.96% number is your certainty that you have a running server
with up-to-date data if only one machine is sync rep.
 
It's a matter of whether your shop needs five nines of availability
or the highest probability of not losing data.  You get to choose.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 08:08 -0600, Kevin Grittner wrote:

 I think you're talking about different metrics, and you're both
 right.  With two servers configured in sync rep your chance of having
 an available (running) server is 99.9992%.  The chance that you know
 that you have one that is totally up to date, with no lost
 transactions is 99.9208%.  The chance that you *actually* have
 up-to-date data would be higher, but you'd have no way to be sure. 
 The 99.96% number is your certainty that you have a running server
 with up-to-date data if only one machine is sync rep.
  
 It's a matter of whether your shop needs five nines of availability
 or the highest probability of not losing data.  You get to choose.

Thanks for those calculations.

Do you agree that requiring response from 2 sync standbys, or locking
up, gives us 94% server availability, but 99.9992% data durability? And
that adding additional async servers would not increase the server
availability of that cluster?

Now lets look at what happens when we first start a standby: we do the
base backup, configure the standby, it connects and then wham we
cannot process any new transactions until the standby has caught up,
which could well be hours on a big database. So if we don't have a
processing mode that allows work to continue, how will we ever enable
synchronous replication on a 24/7 database? How will we ever allow
standbys to catch up if they drop out for a while?

We should factor that into the availability calcs as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base Backup Streaming

2011-01-02 Thread Heikki Linnakangas

On 02.01.2011 14:47, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

BTW, there's a bunch of replication related stuff that we should work to
close, that are IMHO more important than synchronous replication. Like
making the standby follow timeline changes, to make failovers smoother, and
the facility to stream a base-backup over the wire. I wish someone worked on
those...


So, we've been talking about base backup streaming at conferences and we
have a working prototype.  We even have a needed piece of it in core
now, that's the pg_read_binary_file() function.  What we still miss is
an overall design and some integration effort.  Let's design first.


We even have a rudimentary patch to add the required backend support:

http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com

That just needs to be polished into shape, and documentation.


I propose the following new pg_ctl command to initiate the cloning:

  pg_ctl clone [-D datadir] [-s on|off] [-t filename]  primary_conninfo

As far as user are concerned, that would be the only novelty.  Once that
command is finished (successfully) they would edit postgresql.conf and
start the service as usual.  A basic recovery.conf file is created with
the given options, standby_mode is driven by -s and defaults to off, and
trigger_file defaults to being omitted and is given by -t.  Of course
the primary_conninfo given on the command line is what ends up into the
recovery.conf file.

That alone would allow for making base backups for recovery purposes and
for standby preparing.


+1. Or maybe it would be better make it a separate binary, rather than 
part of pg_ctl.



To support for this new tool, the simplest would be to just copy what
I've been doing in the prototype, that is run a query to get the primary
file listing (per tablespace, not done in the prototype) then get their
bytea content over the wire.  That means there's no further backend
support code to write.


It would be so much nicer to have something more integrated, like the 
patch I linked above. Running queries requires connecting to a real 
database, which means that the user needs to have privileges to do that 
and you need to know the name of a valid database. Ideally this would 
all work through a replication connection. I think we should go with 
that from day one.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-02 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 Comments are welcome.

This is a bad idea.  It makes an already-poorly-tested code path
significantly more fragile, in return for nothing of value.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Heikki Linnakangas

On 02.01.2011 15:41, Simon Riggs wrote:

On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote:

On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com  wrote:

Yes, working out the math is a good idea. Things are much clearer if we
do that.

Let's assume we have 98% availability on any single server.

1. Having one primary and 2 standbys, either of which can acknowledge,
and we never lock up if both standbys fail, then we will have 99.9992%
server availability. (So PostgreSQL hits 5 Nines, with data
guarantees). (Maximised availability)


I don't agree with this math. ...(snip by Simon)... 99.96%.


OK, so that is at least 99.96%. Cool.

The key point here is not (1), but option (4).

The approach advocated by Heikki and yourself gives us 94% availability.
IMHO that is ridiculous, and I will not accept that as the *only* way
forwards, for that reason, whoever advocates it or for how long they
keep arguing. I do accept that some wish that as an option.


No-one is suggesting that to be the only option.

The wait-for-all-to-ack looks a lot less ridiculous if you also 
configure a timeout and don't wait for disconnected standbys. I'm not 
sure what the point of such a timeout in general is, but people have 
requested that. Also, setting synchronous_standbys=room1, room2 
doesn't necessarily mean that you have just two standby servers, room1 
and room2 might both represent a group of servers.


I believe we all agree that there's different use cases that require 
different setups. Both first-past-the-post and wait-for-all-to-ack 
have their uses. There's no point in arguing over which is better.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Kevin Grittner
Simon Riggs  wrote:
 
 Do you agree that requiring response from 2 sync standbys, or
 locking up, gives us 94% server availability, but 99.9992% data
 durability?
 
I'm not sure how to answer that.  The calculations so far have been
based around up-time and the probabilities that you have a machine up
at any moment and whether you can have confidence that if you do, you
have all committed transactions represented.  There's been an implied
assumption that the down time is unplanned, but not much else.  The
above question seems to me to get into too many implied assumptions
to feel safe throwing out a number without pinning those down a whole
lot better.  If, for example, that 2% downtime always means the
machine irretrievably went up in smoke, hitting unavailable means
things are unrecoverable.  That's probably not the best assumption
(at least outside of a combat zone), but what is?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 11:11 -0600, Kevin Grittner wrote:
 Simon Riggs  wrote:
  
  Do you agree that requiring response from 2 sync standbys, or
  locking up, gives us 94% server availability, but 99.9992% data
  durability?
  
 I'm not sure how to answer that.  The calculations so far have been
 based around up-time and the probabilities that you have a machine up
 at any moment and whether you can have confidence that if you do, you
 have all committed transactions represented.  There's been an implied
 assumption that the down time is unplanned, but not much else.  The
 above question seems to me to get into too many implied assumptions
 to feel safe throwing out a number without pinning those down a whole
 lot better.  If, for example, that 2% downtime always means the
 machine irretrievably went up in smoke, hitting unavailable means
 things are unrecoverable.  That's probably not the best assumption
 (at least outside of a combat zone), but what is?

Not really relevant. There's no room at all for downtime of any kind in
a situation where all servers must be available.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base Backup Streaming

2011-01-02 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com

 That just needs to be polished into shape, and documentation.

Wow, cool!  I don't know how but I've missed it.

 +1. Or maybe it would be better make it a separate binary, rather than part
 of pg_ctl.

Well the thinking was that nowadays we support initdb from pg_ctl, and
this is another kind of initdb, really.

 I linked above. Running queries requires connecting to a real database,
 which means that the user needs to have privileges to do that and you need
 to know the name of a valid database. Ideally this would all work through a
 replication connection. I think we should go with that from day one.

I didn't think about the connecting to a real database part of it,
versus using a dedicated REPLICATION connection/protocol, and to be
honest, I feared it was too much work.  Seeing that you already did it,
though, +1.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] management of large patches

2011-01-02 Thread Kevin Grittner
Robert Haas  wrote:
 
 - true serializability - not entirely sure of the status of this
 
I try to keep the status section of the Wiki page up-to-date.  I have
just reviewed it and tweaked it for the latest events:
 
http://wiki.postgresql.org/wiki/Serializable#Current_Status
 
There are a number of pending RD issues:
 
http://wiki.postgresql.org/wiki/Serializable#R.26D_Issues
 
Most of these can be deferred.  The ones which really need at least
some attention before release relate to how to deal with serializable
transactions on replication targets and whether we've been properly
careful about using coding style which is safe for machines with weak
memory ordering.  I've done my best to follow discussions on that
topic and do the right thing, but someone with a deeper understanding
of the issues should probably take a look.
 
Someone has joined the effort starting this weekend -- a consultant
who has done a lot of technical writing (John Okite) will be working
on doc changes related to the patch.  (I assume that would best be
submitted as a separate patch.)
 
If you want a shorter version of the patch status: We expect to have
updated patch before the CF, including docs and incorporating
feedback from previous CFs and Heikki's comments on interim work.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread MARK CALLAGHAN
On Thu, Dec 30, 2010 at 9:02 PM, Robert Haas robertmh...@gmail.com wrote:
 reads MySQL documentation

 I see now that you've tried to design this feature in a way that is
 similar to MySQL's offering, which does have some value.  But it
 appears to me that the documentation you've written here is
 substantially similar to the MySQL 5.5 reference documentation.  That
 could get us into a world of legal trouble - that documentation is not
 even open source, let alone BSD.

 http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

The docs originate from work done by my former team at Google. The
content license on this is CC 3.0 BY-SA, so I don't think that should
be a concern.
http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplication
http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign

From http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html)
the MySQL docs don't mention that other transactions can view the
committed data on the master between steps 1 and 2. Is that possible
in this case?

As described in the the MySQL docs, semi-sync has another benefit for
some deployments. It rate limits busy clients to prevent them from
creating replication lag between the primary and standby servers. I
also provided the text for that
(http://bugs.mysql.com/bug.php?id=57911) if you are concerned about
copying.

-- 
Mark Callaghan
mdcal...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Jeff Janes
On Sat, Jan 1, 2011 at 8:35 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, 2011-01-01 at 05:13 -0800, Jeff Janes wrote:
 On 12/31/10, Simon Riggs si...@2ndquadrant.com wrote:

  2. sync does not guarantee that the updates to the standbys are in any
  way coordinated. You can run a query on one standby and get one answer
  and at the exact same time run the same query on another standby and get
  a different answer (slightly ahead/behind). That also means that if the
  master crashes one of the servers will be ahead or behind. You can use
  pg_last_xlog_receive_location() to check which one that is.

 If at least one of the standbys is in the same smoking crater as the
 primary, then pg_last_xlog_receive_location on it is unlikely to
 respond.

 The guarantee goes away precisely when it is needed.

 Fairly obviously, I would not be advocating anything that forced you to
 use a server in the same smoking crater.

You are forced to use the standby which is further ahead, otherwise
you might lose transactions which have been reported to have been
committed.

The mere existence of a commit-releasing stand-by in the same data
center as the primary means that a remote standby is not very useful
for data preservation after campus-wide disasters.  It is probably
behind (due to higher latency) and even if it is not behind, there is
no way to *know* that is not behind if the on-site standby cannot be
contacted.

I understand that you are not advocating the use of one local standby
and one remote standby, both synchronous. But I think we need to
*explicitly* warn against it.  After all, the docs do explicitly
recommend the use of two standbys.  If we assume that the readers are
already experts, then they don't need that advice.  If they are not
experts, then that advice could lead them to shoot themselves in the
foot, both kneecaps, and a femur (metaphorically speaking, unlike the
smoking crater, which is a literal scenario some people need to plan
for).

If durability is more important than availability, what would you
recommend?  Only one synchronous rep, in a remote data center?  Two
(or more) synchronous reps all in the same remote data center?  In two
different remote data centers?


 I can't see any guarantee
 that goes away precisely when it is needed.

In order to know that you are not losing data, you have to be able to
contact every single semi-synchronous standby and invoke
pg_last_xlog_receive_location on it.

If your goal is to have data durability protected from major
catastrophes (and why else would you do synchronous rep to remote data
centers?), then it is expecting a lot to have every single standby
survive that major catastrophe.  That expectation is an unavoidable
consequence of going with single-confirmation-releases.  Perhaps you
think this consequence is too obvious to document--if so I disagree on
that.

 Perhaps you could explain the issue you see, because your comments seem
 unrelated to my point above.

It is directly related to the part of your point about using
pg_last_xlog_receive_location.  When planning for disaster recovery,
it is little comfort that you can do something in a non-disaster case,
if you can't also do it in likely disaster cases.

It probably wasn't relevant to the first part of your point, but I
must admit I did not understand the first part of your point.
Obviously they are coordinated in *some* way (I believe commits occur
in the same order on each server, for example).  Different read-only
standbys could give different results, but only from among the
universe of results made possible by a given commit sequence.  But
that is not the part I had intended to comment on, and I don't think
it is what other people concerned about durability after major
catastrophes were focusing on, either.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] managment of large patches

2011-01-02 Thread pasman pasmański
Hello. Maybe are any often bugs? they may be found by more asserts to
track internal state of structures. Or tools like lastly developed
script for c++ keywords.

-- 
Sent from my mobile device


pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 12:13 -0800, MARK CALLAGHAN wrote:
 On Thu, Dec 30, 2010 at 9:02 PM, Robert Haas robertmh...@gmail.com wrote:
  reads MySQL documentation
 
  I see now that you've tried to design this feature in a way that is
  similar to MySQL's offering, which does have some value.  But it
  appears to me that the documentation you've written here is
  substantially similar to the MySQL 5.5 reference documentation.  That
  could get us into a world of legal trouble - that documentation is not
  even open source, let alone BSD.
 
  http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

Sorry, only just read that bit. Are they that similar? My docs are about
3 times longer and cover all sorts of things. I didn't intentionally
copy anything, but that doesn't really matter, what matters is that if
you think they are similar, legal people might. I've only read the URL
above, not the other links from it.

Robert, Can you identify which paragraphs need to be re-written? I won't
argue, I will just rewrite them or delete them and start afresh. Thanks
for being eagle-eyed.

 The docs originate from work done by my former team at Google. The
 content license on this is CC 3.0 BY-SA, so I don't think that should
 be a concern.
 http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplication
 http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign

I guess that gets us off the hook a little bit, but not far enough for
my liking. Thanks for trying to save me!

 From http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html)
 the MySQL docs don't mention that other transactions can view the
 committed data on the master between steps 1 and 2. Is that possible
 in this case?

Other transactions on the master cannot read data until after the
confirmation it is on the sync standby.

 As described in the the MySQL docs, semi-sync has another benefit for
 some deployments. It rate limits busy clients to prevent them from
 creating replication lag between the primary and standby servers. I
 also provided the text for that
 (http://bugs.mysql.com/bug.php?id=57911) if you are concerned about
 copying.

Yeh, I'm aware of the effect, but I'm not really seeing slowing down the
master as a benefit, its more an implication of synchronicity.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base Backup Streaming

2011-01-02 Thread Magnus Hagander
On Sun, Jan 2, 2011 at 18:53, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com

 That just needs to be polished into shape, and documentation.

I have an updated version of this somewhere.IIRC it also needs things
like tablespace support, ubt it's well on it's way.

 Wow, cool!  I don't know how but I've missed it.

Yes, especially since we discussed it in Stuttgart. I guess it may
have been during the party...


 +1. Or maybe it would be better make it a separate binary, rather than part
 of pg_ctl.

 Well the thinking was that nowadays we support initdb from pg_ctl, and
 this is another kind of initdb, really.

Yes, if it should go in any of the current binaries, initdb would be
the reasonable place, not pg_ctl ;)

That said, if we're going to wrap pg_streamrecv into 9.1 (which I
think we should), then *that* is where it should go.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Simon Riggs
On Sun, 2011-01-02 at 18:54 +0200, Heikki Linnakangas wrote:

 I believe we all agree that there's different use cases that require 
 different setups. Both first-past-the-post and wait-for-all-to-ack 
 have their uses. 

Robert's analysis is that first-past-the-post doesn't actually improve
the durability guarantee (according to his calcs). Which means that
  1 primary, 2 sync standbys with first-past-the-post
is actually worse than
  1 primary, 1 sync and 1 async standby
in terms of its durability guarantees.

So ISTM that Robert does not agree that both have their uses.

 I'm not 
 sure what the point of such a timeout in general is, but people have 
 requested that. 

Again, this sounds like you think a timeout has no measurable benefit,
other than to please some people's perceived needs.

 The wait-for-all-to-ack looks a lot less ridiculous if you also 
 configure a timeout and don't wait for disconnected standbys

Does it? Do Robert, Stefan and Aidan agree? What are the availability
and durability percentages if we do that? Based on those, we may decide
to do that instead. But I'd like to see some analysis of your ideas, not
just a we could. Since nobody has commented on my analysis, lets see
someone else's.

 There's no point in arguing over which is better.

I'm trying to compare quantifiable benefit of various options to see
what goes into Postgres. I don't want to put anything in that we cannot
all agree has a measurable benefit to someone (that has the appropriate
preference).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Base Backup Streaming

2011-01-02 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 Yes, especially since we discussed it in Stuttgart. I guess it may
 have been during the party...

I remember we talked about it, I didn't remember a patch had reached the list…

 Yes, if it should go in any of the current binaries, initdb would be
 the reasonable place, not pg_ctl ;)

Well, pg_ctl is able to call initdb for users, but yes.

 That said, if we're going to wrap pg_streamrecv into 9.1 (which I
 think we should), then *that* is where it should go.

They should at least cooperate so that you don't need to setup WAL
archiving explicitly while preparing the standby, if at least possible.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] and it's not a bunny rabbit, either

2011-01-02 Thread Peter Eisentraut
On lör, 2011-01-01 at 17:21 -0500, Robert Haas wrote:
  I don't see anything wrong with having 20 or 30 messages of variants of
 
  foo cannot be used on bar
 
  without placeholders.
 
 Well, that's OK with me.  It seems a little grotty, but manageably so.
  Questions:
 
 1. Should we try to include the name of the object?  If so, how?

Hmm.  There is a bit of a difference in my mind between, say,

constraints cannot be used on sequences

constraint foo cannot be used on sequence bar

the latter leaving open the question whether some other combination
might work.

 2. Can we have a variant with an SQL-command-fragment parameter?
 
 %s cannot be used on views
 where %s might be CLUSTER, DROP COLUMN, etc.

That's OK; we do that in several other places.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Jim Nasby
On Dec 31, 2010, at 1:35 PM, Joel Jacobson wrote:
 2010/12/31 Simon Riggs si...@2ndquadrant.com
 Please call it something other than snapshot. There's already about 3
 tools called something similar and a couple of different meanings of the
 term in the world of Postgres.
 
 
 Thanks, good point.
 Renamed to fsnapshot.

Is it actually limited to functions? ISTM this concept would be valuable for 
anything that's not in pg_class (in other words, anything that doesn't have 
user data in it).

Also, I'm not sure why this needs to be in contrib vs pgFoundry.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] C++ keywords in headers (was Re: [GENERAL] #include funcapi.h)

2011-01-02 Thread Peter Geoghegan
I believe that Dave Page wants to move to building pg for windows
using visual C++ 2010 some time this year. That alone may be enough of
a reason to check for C++0x keywords in headers:

http://blogs.msdn.com/b/vcblog/archive/2010/04/06/c-0x-core-language-features-in-vc10-the-table.aspx

I think that there is likely to be an expectation that the same
compiler that is used to build pg should be able to include pg headers
in C++ TUs.

-- 
Regards,
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-02 Thread Tatsuo Ishii
 Tatsuo Ishii is...@postgresql.org writes:
 Comments are welcome.
 
 This is a bad idea.  It makes an already-poorly-tested code path
 significantly more fragile, in return for nothing of value.

Are you saying that procsignal.c is the already-poorly-tested one? If
so, why?

As for value, I have already explained why we need this in the
upthread.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery conflict monitoring

2011-01-02 Thread Simon Riggs
On Mon, 2010-12-27 at 14:39 +0100, Magnus Hagander wrote:
 On Thu, Dec 23, 2010 at 13:09, Magnus Hagander mag...@hagander.net wrote:
  This patch adds counters and views to monitor hot standby generated
  recovery conflicts. It extends the pg_stat_database view with one
  column with the total number of conflicts, and also creates a new view
  pg_stat_database_conflicts that contains a breakdown of exactly what
  caused the conflicts.
 
  Documentation still pending, but comments meanwhile is of course 
  appreciated ;)
 
 Heikki pointed out over IM that it's pointless to count stats caused
 by recovery conflict with drop database - since we drop the stats
 record as soon as it arrives anyway. Here's an updated patch that
 removes that, and also adds some documentation.

I like the patch, well inspired, code in the right places AFAICS. No
code comments at all.

Couple of thoughts: 

* are we safe to issue stats immediately before issuing FATAL? Won't
some of them get lost?

* Not clear what I'd do with database level information, except worry a
lot. Maybe an option to count conflicts per user would be better, since
at least we'd know exactly who was affected by those. Just an idea.

* Would it better to have a log_standby_conflicts that allowed the
opportunity to log the conflicting SQL, duration until cancelation etc?

I'd rather have what you have than nothing at all though... the new
hot_standby_feedback mode should be acting to reduce these, so it would
be useful to have this patch enabled for testing that feature.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] page compression

2011-01-02 Thread Simon Riggs
On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote:

 I know its been discussed before, and one big problem is license and 
 patent problems.

Would like to see a design for that. There's a few different ways we
might want to do that, and I'm interested to see if its possible to get
compressed pages to be indexable as well.

For example, if you compress 2 pages into 8Kb then you do one I/O and
out pops 2 buffers. That would work nicely with ring buffers.

Or you might try to have pages  8Kb in one block, which would mean
decompressing every time you access the page. That wouldn't be much of a
problem if we were just seq scanning.

Or you might want to compress the whole table at once, so it can only be
read by seq scan. Efficient, but not indexes.

It would be interesting to explore pre-populating the compression
dictionary with some common patterns.

Anyway, interesting topic.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Joel Jacobson
2011/1/2 Jim Nasby j...@nasby.net

  Renamed to fsnapshot.

 Is it actually limited to functions? ISTM this concept would be valuable
 for anything that's not in pg_class (in other words, anything that doesn't
 have user data in it).


My ambition is to primarily support functions. Support for other object
types are merely a necessary side-effect of the function dependencies.

Is there a matrix of all possible object types dependencies?
If not, for functions, is the following list correct?
   Object types which may depend on functions: constraints, views, triggers,
any more?
   Functions may depend on: language, any more?

Instead of limiting the support to functions, perhaps it would make more
sense to limit it to all non-data objects?
Is there a term for the group of object types not carrying any user data?
Which object types do carry user data? I can only think of tables and
sequences, any other?



 Also, I'm not sure why this needs to be in contrib vs pgFoundry.


Good point. It's actually in neither of them right now, it's only at
github.com :) I merely used the prefix contrib/ in the subject line to
indicate it's not a patch to the core.

I do hope though it's possible to get a place for it in contrib/ at some
time in the future, I think there is a chance quite a lot of users would
appreciate a quicker, less error-prone way of handling these things.

This tool must be made extremely reliable, otherwise you won't feel safe
using it in a production environment for deployment and revert purposes,
which is my company's requirement.

I hope to achieve this by keeping a bare minimum approach to features, and
making sure it only fulfills the objective:
1. take a snapshot of all non-data objects
2. deploy code, test new code, or let time pass while other people make a
mess in your database
3. revert to previous snapshot without affecting any of the new data,
generated in step 2

I put my faith in the reliability on system functions, such
as pg_get_functiondef(), pg_get_viewdef() etc, to build proper create/drop
commands for each object.
Even nicer would be if the pg_catalog provided functions to generate SQL
create/drop commands for all non-data object types,
and to make sure _everything_ is included in the command, ensuring the
object is created exactly the same,
currently pg_get_functiondef() does not restore the ownership of the
function, which I had to append manually.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Joel Jacobson
2011/1/3 Joel Jacobson j...@gluefinance.com

 2011/1/2 Jim Nasby j...@nasby.net

 Is it actually limited to functions? ISTM this concept would be valuable
 for anything that's not in pg_class (in other words, anything that doesn't
 have user data in it).


 Instead of limiting the support to functions, perhaps it would make more
 sense to limit it to all non-data objects?
 Is there a term for the group of object types not carrying any user data?


My bad, I see you already answered both my questions.
So, it does make sense, and the term for non-data object types is therefore
non-pg_class, non-class or perhaps non-relation objects?

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Andrew Dunstan



On 01/02/2011 07:44 PM, Joel Jacobson wrote:


Also, I'm not sure why this needs to be in contrib vs pgFoundry.


Good point. It's actually in neither of them right now, it's only at 
github.com http://github.com :) I merely used the prefix contrib/ in 
the subject line to indicate it's not a patch to the core.





contrib in PostgreSQL means a module maintained by the backend 
developers.


But it's not clear to me that there is any particular reason why this 
should be in contrib.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep Design

2011-01-02 Thread Hannu Krosing

On 2.1.2011 5:36, Robert Haas wrote:

On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com  wrote:

Yes, working out the math is a good idea. Things are much clearer if we
do that.

Let's assume we have 98% availability on any single server.

1. Having one primary and 2 standbys, either of which can acknowledge,
and we never lock up if both standbys fail, then we will have 99.9992%
server availability. (So PostgreSQL hits 5 Nines, with data
guarantees). (Maximised availability)

I don't agree with this math.  If the master and one standby fail
simultaneously, the other standby is useless, because it may or may
not be caught up with the master.  You know that the last transaction
acknowledged as committed by the master is on at least one of the two
standbys, but you don't know which one, and so you can't safely
promote the surviving standby.
(If you are working in an environment where promoting the surviving
standby when it's possibly not caught up is OK, then you don't need
sync rep in the first place: you can just run async rep and get much
better performance.)
So the availability is 98% (you are up when the master is up) + 98%^2
* 2% (you are up when both slaves are up and the master is down) =
99.92%.  If you had only a single standby, then you could be certain
that any commit acknowledged by the master was on that standby.  Thus
your availability would be 98% (up when master is up) + 98% * 2% (you
are up when the master is down and the slave is up) = 99.96%.

OTOH, in the case where you need _all_ the slaves to confirm any failing 
slave brings

the master down, so adding a slave brings down availability by extra 2%

The solution to achieving good durability AND availability is requiring 
N past the

post instead of 1 past the post.

In this case you can get to 99.9992% availability with master + 3 sync 
slaves, 2 of which have ACK.


---
Hannu Krosing
Performance and Infinite Scalability Consultant
http://www.2ndQuadrant.com/books/




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid

2011-01-02 Thread Greg Smith

Marko Tiikkaja wrote:
I'm confused.  Are you saying that the patch is supposed to lock the 
table against concurrent INSERT/UPDATE/DELETE/MERGE?  Because I don't 
see it in the patch, and the symptoms you're having are a clear 
indication of the fact that it's not happening.  I also seem to recall 
that people thought locking the table would be excessive.


That's exactly what it should be doing.  I thought I'd seen just that in 
one of the versions of this patch, but maybe that's a mistaken memory on 
my part.  In advance of the planned but not available yet ability to 
lock individual index key values, locking the whole table is the only 
possible implementation that can work correctly here I'm aware of.  In 
earlier versions, I think this code was running into issues before it 
even got to there.  If you're right that things like the duplicate key 
error in the current version are caused exclusively by not locking 
enough, that may be the next necessary step here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Visual Studio 2010/Windows SDK 7.1 support

2011-01-02 Thread Brar Piening

Hi,

i'v created a patch enables support for building PostgreSQL with Visual 
Studio 2010 or Microsoft Windows SDK for Windows 7 and .NET Framework 4 
(Windows SDK 7.1).

You can grab it from http://www.piening.info/VS2010.patch
It only touches the .pl, .pm and .bat files in src/tools/msvc so it's 
relevant for Windows only (that's why i've left crlf line endings - is 
that actually ok or should I have converted them?).
It's diffed against current head + running perltidy -b -bl -nsfs -naws 
-l=100 -ole=unix *.pl *.pm as described in the README file (which seems 
not to have been run before committing Mkvcbuild.pm the last time).
It is problably neither the perfect way to introduce VS2010 support (my 
perl is better than my C but probably still not what you are used to) 
nor is it my way to try to make you officially support VS 2010. But 
perhaps it's something you could start with once you decide to upgrade 
the msvc toolchain.
The patch is necessary because M$ got rid of vcbuild in favour of 
msbuild which uses a different build file format (*.vcxproj).
It should support all use cases described in 
http://www.postgresql.org/docs/current/static/install-windows-full.html 
and builds in Windows SDK 7.0 (VS 2008 toolchain) x86 and x64 as well as 
Windows SDK 7.1 (VS 2010 toolchain) x86 and x64.
The SDK 7.1 build produces tons of warnings which are mostly macro 
redefinitions of EIDRM, EMSGSIZE, EAFNOSUPPORT, EWOULDBLOCK, ECONNRESET, 
EINPROGRESS, ENOBUFS, EPROTONOSUPPORT, ECONNREFUSED and EOPNOTSUPP which 
seem to have found their way into errno.h finally. Cutting those out of 
src\include\pg_config_os.h and src\interfaces\libpq\win32.h makes the 
project build pretty clean.
I resisted the temptation to parse them out of those files during 
Mkvcbuild::mkvcbuild as this should probably be handled by some 
preprocessor defines.

The build result passes vcregress check.
The pgsql.sln file also opens and builds in VS 2010.

I hope it is of some use.

Best regards,

Brar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers