Re: [HACKERS] Sync Rep Design
On 02.01.2011 00:40, Josh Berkus wrote: On 1/1/11 5:59 AM, Stefan Kaltenbrunner wrote: well you keep saying that but to be honest I cannot really even see a usecase for me - what is only a random one of a set of servers is sync at any time and I don't really know which one. My usecases would al involved 2 sync standbys and 1 or more async ones. but the second sync one would be in a different datacenter and I NEED to protect against a datacenter failure which your proposals says I cannot do :( As far as I know, *nobody* has written the bookkeeping code to actually track which standbys have ack'd. We need to get single-ack synch standby merged, tested and working before we add anything as complicated as each standby on this list must ack. That means that it's extremely unlikely for 9.1 at this point. The bookkeeping will presumably consist of an XLogRecPtr in shared memory for each standby, tracking how far the standby has acknowledged. At commit, you scan the standby slots in shared memory and check that the required standbys have acknowledged your commit record. The bookkeeping required is the same whether or not we support a list of standbys that must ack or just one. Frankly, if Simon hadn't already submitted code, I'd be pushing for single-standby-only for 9.1, instead of any one. Yes, we are awfully late, but let's not panic. BTW, there's a bunch of replication related stuff that we should work to close, that are IMHO more important than synchronous replication. Like making the standby follow timeline changes, to make failovers smoother, and the facility to stream a base-backup over the wire. I wish someone worked on those... Hmm, access control... We haven't yet discussed what privileges a standby needs to become synchronous. Perhaps it needs to be a separate privilege that can be granted, in addition to the replication privilege? No, I don't think so. An additional priv would just complicate life for DBAs without providing any real benefit. You'd be guarding against the very narrow hypothetical case where there's a server admin with limited privs on the master, and authorization to create async standbies, but not the authorization to create s synch standby. How likely is that to *ever* happen? Very likely. A synchronous standby can bring the master to a halt, while an asynchronous one is rather harmless. If I were a DBA, and the data wasn't very sensitive, I would liberally hand out async privileges to my colleagues to set up reporting standbys, test servers etc. But I would *not* give them synchronous privileges, because sooner or later one would go hmm, I wonder what happens if I make this synchronous, or haphazardly copy the config file from a synchronous standby. That would either bring down the master, or act as a fake standby, acknowledging commits before they're flushed to the real synchronous standby. Either one would be bad. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI SLRU low-level functions first cut
On 01.01.2011 23:21, Kevin Grittner wrote: I've got low-level routines coded for interfacing predicate.c to SLRU to handle old committed transactions, so that SSI can deal with situations where a large number of transactions are run during the lifetime of a single serializable transaction. I'm not actually *using* these new functions yet, but that's what I do next. I would love it if someone could review this commit and let me know whether it looks generally sane. http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=00a0bc6c47c8173e82e5927d9b75fe570280860f Nothing checking for the hi-bit flag AFAICS. I guess the code that uses that would do check it. But wouldn't it be simpler to mark the unused slots with zero commitseqno, instead of messing with the hi-bit in valid values? It's probably not necessary to explicitly truncate the slru at startup. We don't do that for pg_subtrans, which also doesn't survive restarts. The next checkpoint will truncate it. It would possibly be simpler to not reset headXid and tailXid to InvalidTransactionId when the window is empty, but represent that as tailXid == headXid + 1. OldSerXidGetMinConflictCommitSeqNo() calls LWLockRelease twice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 01/02/2011 09:35 AM, Heikki Linnakangas wrote: On 02.01.2011 00:40, Josh Berkus wrote: On 1/1/11 5:59 AM, Stefan Kaltenbrunner wrote: well you keep saying that but to be honest I cannot really even see a usecase for me - what is only a random one of a set of servers is sync at any time and I don't really know which one. My usecases would al involved 2 sync standbys and 1 or more async ones. but the second sync one would be in a different datacenter and I NEED to protect against a datacenter failure which your proposals says I cannot do :( As far as I know, *nobody* has written the bookkeeping code to actually track which standbys have ack'd. We need to get single-ack synch standby merged, tested and working before we add anything as complicated as each standby on this list must ack. That means that it's extremely unlikely for 9.1 at this point. The bookkeeping will presumably consist of an XLogRecPtr in shared memory for each standby, tracking how far the standby has acknowledged. At commit, you scan the standby slots in shared memory and check that the required standbys have acknowledged your commit record. The bookkeeping required is the same whether or not we support a list of standbys that must ack or just one. Frankly, if Simon hadn't already submitted code, I'd be pushing for single-standby-only for 9.1, instead of any one. Yes, we are awfully late, but let's not panic. BTW, there's a bunch of replication related stuff that we should work to close, that are IMHO more important than synchronous replication. Like making the standby follow timeline changes, to make failovers smoother, and the facility to stream a base-backup over the wire. I wish someone worked on those... yeah I agree that those two are much more of a problem for the general user base. Whatever people think about our current system - it is very easy to configure(in terms of knobs to toggle) but extremely hard to get set up and dealt with during failovers(and I know nobody who got it right the first few times or has not fucked up one thing in the process). Syncrep is importantant but I would argue that getting those two fixed is even more so ;) Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 10:35 +0200, Heikki Linnakangas wrote: BTW, there's a bunch of replication related stuff that we should work to close, that are IMHO more important than synchronous replication. Like making the standby follow timeline changes, to make failovers smoother, and the facility to stream a base-backup over the wire. I wish someone worked on those... Hopefully, you'll be allowed to work on those, if they are more important? -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Basic foreign table support.
Typo, I think: - (errmsg(skipping \%s\ --- cannot vacuum indexes, views, or special system tables, + (errmsg(skipping \%s\ --- cannot only non-tables or special system tables, //Magnus On Sun, Jan 2, 2011 at 05:48, Robert Haas rh...@postgresql.org wrote: Basic foreign table support. Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas Branch -- master Details --- http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=0d692a0dc9f0e532c67c577187fe5d7d323cb95b Modified Files -- -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] management of large patches
On Sun, Jan 2, 2011 at 06:32, Robert Haas robertmh...@gmail.com wrote: We're coming the end of the 9.1 development cycle, and I think that there is a serious danger of insufficient bandwidth to handle the large patches we have outstanding. For my part, I am hoping to find the bandwidth to two, MAYBE three major commits between now and the end of 9.1CF4, but I am not positive that I will be able to find even that much time, and the number of major patches vying for attention is considerably greater than that. Quick estimate: - SQL/MED - probably needs ~3 large commits: foreign table scan, file FDW, postgresql FDW, plus whatever else gets submitted in the next two weeks - MERGE - checkpoint improvements - SE-Linux integration - extensions - may need 2 or more commits - true serializability - not entirely sure of the status of this - writeable CTEs (Tom has indicated he will look at this) - PL/python patches (Peter has indicated he will look look at this) - snapshot taking inconsistencies (Tom has indicated he will look at this) - per-column collation (Peter) - synchronous replication (Simon, and, given the level of interest in and complexity of this feature, probably others as well) I guess my basic question is - is it realistic to think that we're going to get all of the above done in the next 45 days? Is there anything we can do make the process more efficient? If a few more large patches drop into the queue in the next two weeks, will we have bandwidth for those as well? If we don't think we can get everything done in the time available, what's the best way to handle that? I Well, we've always (well, since we had cf's) said that large patches shouldn't be submitted for the last CF, they should be submitted for one of the first. So if something *new* gets dumped on us for the last one, giving priority to the existing ones in the queue seems like the only fair option. As for priority between those that *were* submitted earlier, and have been reworked (which is how the system is supposed to work), it's a lot harder. And TBH, I think we're going to have a problem getting all those done. But the question is - are all ready enough, or are a couple going to need the returned with feedback status *regardless* of if this is the last CF or not? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility
On Wed, Dec 29, 2010 at 19:49, Robert Haas robertmh...@gmail.com wrote: On Dec 29, 2010, at 10:14 AM, Magnus Hagander mag...@hagander.net wrote: We can be held responsible for the packaging decisions if they use *our* make install commands, imho. Yep. So, as I see it there are two ways of doing it - install a catversion.h file and include it from libpq-fe.h, or modify the libpq-fe.h. I still think modifying libpq-fe.h is the better of these choices - but either of them would work. But is the catversion value really the best interface for the user? This is about libpq functionality level, which really has nothing to do with the backend catalog, does it? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sat, 2011-01-01 at 22:11 -0500, Aidan Van Dyk wrote: On Sat, Jan 1, 2011 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2011-01-01 at 14:40 -0800, Josh Berkus wrote: Standby in general deals with the A,D,R triangle (Availability, Durability, Response time). Any one configuration is the A,R configuration, and the only reason to go out with it for 9.1 is because it's simpler to implement than the D,R configuration (all standbys must ack). Nicely put. Not the only reason though... As I showed earlier, the AR gives you 99.999% availability and the DR gives you 94% availability, considering a 3 server config. If you add more servers, the availability of the DR option gets much worse, very quickly. The performance of AR is much better also, and stays same or better as cluster size increases. DR choice makes performance degrade as cluster size increases, since it works at the speed of the slowest node. I'm all for getting first-past-post in for 9.1. Otherwise I fear we'll get nothing. Stephen and I will only be able to use 1 sync slave, the DR-site one. No, the AR and DR options are identical with just one sync standby. You've been requesting the DR option with 2 standbys, which is what gives you 94% availability. That's fine. I can live with it, and make my local slave be async. Or replicate the FS/block under WAL. I can monitor the out of it, and unless it goes down, it should easily be able to keep up with the remote sync one beind a slower WAN link. And I think both Stephen and I understand your availability math. We're not arguing that the 1st past post both gives better query availabiliyt, and cluster scale performance. But when the primary datacenter servers are dust in the crater (or boats in the flood, or ash in the fire), I either keep my job, or I don't. And that depends on whether there is a chance I (my database system) confirmed a transaction that I can't recover. I'm not impressed. You neglect to mention that Oracle and MySQL would put you in exactly the same position. You also neglect to say that if the local standby goes down, you were advocating a design that would take the whole application down. If you actually did what you have been suggesting, and the cluster went down as it inevitably would do, once your colleagues realise that you knowingly configured the cluster to have only 94% availability, you won't have a job anymore, you'll be escorted off the premises while shouting but while it was down, it lost no data. When that never happens, thank me. There are people that need more durability than availability, but not many. If the database handles high value transactions, they very probably want it to keep on processing high value transactions. You'll have the choice of how to configure it, because of me listening to other people's views and selecting only the ideas that make sense. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 10:35 +0200, Heikki Linnakangas wrote: Frankly, if Simon hadn't already submitted code, I'd be pushing for single-standby-only for 9.1, instead of any one. Yes, we are awfully late, but let's not panic. Yes, we're about a year late. Getting a simple feature like this into the code could have been done in 9.0. We must stop returning to overcomplex features, especially if they aren't backed up with solid analysis of things like server availability and query visibility. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Support for negative index values in array fetching
Hi, I ran into the problem of getting the last n elements out of an array and while some workarounds do exist: (http://stackoverflow.com/questions/2949881/getting-the-last-element-of-a-postgres-array-declaratively) I was still annoyed that I couldn't just ask for the last n values in an array Python/Perl style. Here's a patch to add support for negative index values in fetching elements from an array. i.e. postgres=# CREATE TABLE blah (a int[]); CREATE TABLE Time: 11.357 ms postgres=# INSERT INTO blah (a) VALUES (ARRAY[1,2,3,4,5,6,7,8,9,10]); INSERT 0 1 Time: 1.282 ms postgres=# SELECT a[-1] FROM blah; a 10 (1 row) Time: 0.450 ms postgres=# SELECT a[-5:10] FROM blah; a -- {6,7,8,9,10} (1 row) Time: 0.949 ms While testing this I BTW ran into funny behaviour in setting array slices, as in: postgres=# update blah set a[-5] = 12; UPDATE 1 Time: 1.500 ms postgres=# select * from blah; a [-5:10]={12,NULL,NULL,NULL,NULL,NULL,1,2,3,4,5,6,7,8,9,10} (1 row) Time: 0.431 ms And since this negative array expansion behaviour totally surprised me, I haven't changed that in this patch at all. -- Hannu Valtonen diff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml index bb4657e..dc7b6f4 100644 --- a/doc/src/sgml/array.sgml +++ b/doc/src/sgml/array.sgml @@ -224,7 +224,9 @@ SELECT name FROM sal_emp WHERE pay_by_quarter[1] lt;gt; pay_by_quarter[2]; By default productnamePostgreSQL/productname uses a one-based numbering convention for arrays, that is, an array of replaceablen/ elements starts with literalarray[1]/literal and - ends with literalarray[replaceablen/]/literal. + ends with literalarray[replaceablen/]/literal. Negative + array subscript numbers indicate that the position of the element is calculated from + the end of the array, with -1 indicating the last element in the array. /para para @@ -242,6 +244,34 @@ SELECT pay_by_quarter[3] FROM sal_emp; /para para + This query retrieves the last quarter pay of all employees: + +programlisting +SELECT pay_by_quarter[-1] FROM sal_emp; + + pay_by_quarter + + 1 + 25000 +(2 rows) +/programlisting +/para + +para + This query retrieves the pay of all employees for the last three quarters: +programlisting +SELECT pay_by_quarter[-3:4] FROM sal_emp; + + pay_by_quarter +- + {1,1,1} + {25000,25000,25000} +(2 rows) + +/programlisting + /para + + para We can also access arbitrary rectangular slices of an array, or subarrays. An array slice is denoted by writing literalreplaceablelower-bound/replaceable:replaceableupper-bound/replaceable/literal diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index fb4cbce..9d6c3f1 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -1786,6 +1786,8 @@ array_ref(ArrayType *array, } for (i = 0; i ndim; i++) { + if (indx[i] 0) /* A negative index number indicates a position calculated from the end of the array */ + indx[i] = dim[i] + indx[i] + lb[i]; if (indx[i] lb[i] || indx[i] = (dim[i] + lb[i])) { *isNull = true; @@ -1914,6 +1916,10 @@ array_get_slice(ArrayType *array, for (i = 0; i nSubscripts; i++) { + if (lowerIndx[i] 0) /* A negative index number indicates a position calculated from the end of the array */ + lowerIndx[i] = dim[i] + lowerIndx[i] + lb[i]; + if (upperIndx[i] 0) /* A negative index number indicates a position calculated from the end of the array */ + upperIndx[i] = dim[i] + upperIndx[i] + lb[i]; if (lowerIndx[i] lb[i]) lowerIndx[i] = lb[i]; if (upperIndx[i] = (dim[i] + lb[i])) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension upgrade, patch v0: debug help needed
Dimitri Fontaine dimi...@2ndquadrant.fr writes: The problem occurs on ALTER OPERATOR FAMILY ... SET EXTENSION, that's what dichotomy on the citext.upgrade.sql tells me. The code in question was copy/pasted from the SET SCHEMA code path in gram.y then other related files. So I just tested a clean HEAD checkout then the following steps: make -C contrib/citext install psql -f .../head/share/contrib/citext.sql psql dim=# do $$ begin execute 'alter operator class public.citext_ops using btree set schema utils'; end; $$; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Will try to debug that as soon as possible, but spare time here tends to be sparse so I preferred sending this mail first. Preliminary investigation leads me thinking the cause is using n-objarg rather than n-addname to host the access_method. Working on a fix. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 2.1.2011 5:36, Robert Haas wrote: On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com wrote: Yes, working out the math is a good idea. Things are much clearer if we do that. Let's assume we have 98% availability on any single server. 1. Having one primary and 2 standbys, either of which can acknowledge, and we never lock up if both standbys fail, then we will have 99.9992% server availability. (So PostgreSQL hits 5 Nines, with data guarantees). (Maximised availability) I don't agree with this math. If the master and one standby fail simultaneously, the other standby is useless, because it may or may not be caught up with the master. You know that the last transaction acknowledged as committed by the master is on at least one of the two standbys, but you don't know which one, and so you can't safely promote the surviving standby. (If you are working in an environment where promoting the surviving standby when it's possibly not caught up is OK, then you don't need sync rep in the first place: you can just run async rep and get much better performance.) So the availability is 98% (you are up when the master is up) + 98%^2 * 2% (you are up when both slaves are up and the master is down) = 99.92%. If you had only a single standby, then you could be certain that any commit acknowledged by the master was on that standby. Thus your availability would be 98% (up when master is up) + 98% * 2% (you are up when the master is down and the slave is up) = 99.96%. OTOH, in the case where you need _all_ the slaves to confirm any failing slave brings the master down, so adding a slave brings down availability by extra 2% The solution to achieving good durability AND availability is requiring N past the post instead of 1 past the post. In this case you can get to 99.9992% availability with master + 3 sync slaves, 2 of which have ACK. --- Hannu Krosing Performance and Infinite Scalability Consultant http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension upgrade, patch v0: debug help needed
Dimitri Fontaine dimi...@2ndquadrant.fr writes: make -C contrib/citext install psql -f .../head/share/contrib/citext.sql psql dim=# do $$ begin execute 'alter operator class public.citext_ops using btree set schema utils'; end; $$; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. The fix was ok, but I had to test with the right environment to be able to appreciate that :) Please find it attached. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support *** a/src/backend/commands/alter.c --- b/src/backend/commands/alter.c *** *** 198,208 ExecAlterObjectSchemaStmt(AlterObjectSchemaStmt *stmt) break; case OBJECT_OPCLASS: ! AlterOpClassNamespace(stmt-object, stmt-objarg, stmt-newschema); break; case OBJECT_OPFAMILY: ! AlterOpFamilyNamespace(stmt-object, stmt-objarg, stmt-newschema); break; case OBJECT_SEQUENCE: --- 198,208 break; case OBJECT_OPCLASS: ! AlterOpClassNamespace(stmt-object, stmt-addname, stmt-newschema); break; case OBJECT_OPFAMILY: ! AlterOpFamilyNamespace(stmt-object, stmt-addname, stmt-newschema); break; case OBJECT_SEQUENCE: *** a/src/backend/commands/opclasscmds.c --- b/src/backend/commands/opclasscmds.c *** *** 1993,2008 AlterOpClassOwner_internal(Relation rel, HeapTuple tup, Oid newOwnerId) * ALTER OPERATOR CLASS any_name USING access_method SET SCHEMA name */ void ! AlterOpClassNamespace(List *name, List *argam, const char *newschema) { Oid amOid; - char *access_method = linitial(argam); Relation rel; Oid oid; Oid nspOid; - Assert(list_length(argam) == 1); - amOid = get_am_oid(access_method, false); rel = heap_open(OperatorClassRelationId, RowExclusiveLock); --- 1993,2005 * ALTER OPERATOR CLASS any_name USING access_method SET SCHEMA name */ void ! AlterOpClassNamespace(List *name, char *access_method, const char *newschema) { Oid amOid; Relation rel; Oid oid; Oid nspOid; amOid = get_am_oid(access_method, false); rel = heap_open(OperatorClassRelationId, RowExclusiveLock); *** *** 2185,2199 get_am_oid(const char *amname, bool missing_ok) * ALTER OPERATOR FAMILY any_name USING access_method SET SCHEMA name */ void ! AlterOpFamilyNamespace(List *name, List *argam, const char *newschema) { Oid amOid; - char *access_method = linitial(argam); Relation rel; Oid nspOid; Oid oid; - Assert(list_length(argam) == 1); amOid = get_am_oid(access_method, false); rel = heap_open(OperatorFamilyRelationId, RowExclusiveLock); --- 2182,2194 * ALTER OPERATOR FAMILY any_name USING access_method SET SCHEMA name */ void ! AlterOpFamilyNamespace(List *name, char *access_method, const char *newschema) { Oid amOid; Relation rel; Oid nspOid; Oid oid; amOid = get_am_oid(access_method, false); rel = heap_open(OperatorFamilyRelationId, RowExclusiveLock); *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 6225,6231 AlterObjectSchemaStmt: AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt); n-objectType = OBJECT_OPCLASS; n-object = $4; ! n-objarg = list_make1($6); n-newschema = $9; $$ = (Node *)n; } --- 6225,6231 AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt); n-objectType = OBJECT_OPCLASS; n-object = $4; ! n-addname = $6; n-newschema = $9; $$ = (Node *)n; } *** *** 6234,6240 AlterObjectSchemaStmt: AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt); n-objectType = OBJECT_OPFAMILY; n-object = $4; ! n-objarg = list_make1($6); n-newschema = $9; $$ = (Node *)n; } --- 6234,6240 AlterObjectSchemaStmt *n = makeNode(AlterObjectSchemaStmt); n-objectType = OBJECT_OPFAMILY; n-object = $4; ! n-addname = $6; n-newschema = $9; $$ = (Node *)n; } *** a/src/include/commands/defrem.h --- b/src/include/commands/defrem.h *** *** 101,111 extern void RenameOpClass(List *name, const char *access_method, const char *new extern void RenameOpFamily(List *name, const char *access_method, const char *newname); extern void AlterOpClassOwner(List *name, const char *access_method, Oid newOwnerId); extern void AlterOpClassOwner_oid(Oid opclassOid, Oid newOwnerId); ! extern void AlterOpClassNamespace(List *name, List *argam, const char *newschema); extern void AlterOpFamilyOwner(List *name, const char *access_method, Oid newOwnerId); extern void AlterOpFamilyOwner_oid(Oid opfamilyOid, Oid newOwnerId); extern
Re: [HACKERS] Support for negative index values in array fetching
On Jan2, 2011, at 11:45 , Valtonen, Hannu wrote: I ran into the problem of getting the last n elements out of an array and while some workarounds do exist: (http://stackoverflow.com/questions/2949881/getting-the-last-element-of-a-postgres-array-declaratively) I was still annoyed that I couldn't just ask for the last n values in an array Python/Perl style. Here's a patch to add support for negative index values in fetching elements from an array. That won't work. In SQL, array indices don't necessarily start with 0 (or 1, or *any* single value). Instead, you can each dimension's lower and upper bound for index values with array_lower() and array_upper(). Here's an example fgp= do $$ declare a text[]; begin a[-1] := 'foo'; a[0] := 'bar'; raise notice 'a[-1] == %', a[-1]; end $$ language 'plpgsql' ; This will raise the notice 'a[-1] == foo'! The only way around that would be to introduce magic constants lower, upper that can be used within index expressions and evaluate to the indexed dimension's lower and upper bound. You'd then use my_array[upper], my_array[upper-1], ... to refer to the last, second-to-last, ... element in the array. Actually doing this could get pretty messy, though - not sure if it's really worth the effort... best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Basic foreign table support.
On Sun, Jan 2, 2011 at 4:24 AM, Magnus Hagander mag...@hagander.net wrote: Typo, I think: - (errmsg(skipping \%s\ --- cannot vacuum indexes, views, or special system tables, + (errmsg(skipping \%s\ --- cannot only non-tables or special system tables, Oops, fixed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility
On Sun, Jan 2, 2011 at 4:36 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Dec 29, 2010 at 19:49, Robert Haas robertmh...@gmail.com wrote: On Dec 29, 2010, at 10:14 AM, Magnus Hagander mag...@hagander.net wrote: We can be held responsible for the packaging decisions if they use *our* make install commands, imho. Yep. So, as I see it there are two ways of doing it - install a catversion.h file and include it from libpq-fe.h, or modify the libpq-fe.h. I still think modifying libpq-fe.h is the better of these choices - but either of them would work. But is the catversion value really the best interface for the user? This is about libpq functionality level, which really has nothing to do with the backend catalog, does it? It doesn't seem to me that a change of this type requires a catversion bump. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] management of large patches
On Sun, Jan 2, 2011 at 4:29 AM, Magnus Hagander mag...@hagander.net wrote: As for priority between those that *were* submitted earlier, and have been reworked (which is how the system is supposed to work), it's a lot harder. And TBH, I think we're going to have a problem getting all those done. But the question is - are all ready enough, or are a couple going to need the returned with feedback status *regardless* of if this is the last CF or not? Well, that all depends on how much work people are willing to put into reviewing and committing them, which I think is what we need to determine. None of those patches are going to be as simple as patch -p1 $F git commit -a git push. Having done a couple of these now, I'd say that doing final review and commit of a patch of this scope takes me ~20 hours of work, but it obviously varies a lot based on how good the patch is to begin with and how much review has already been done. So I guess the question is - who is willing to step up to the plate, either as reviewer or as final reviewer/committer? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Base Backup Streaming (was: [HACKERS] Sync Rep Design)
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: BTW, there's a bunch of replication related stuff that we should work to close, that are IMHO more important than synchronous replication. Like making the standby follow timeline changes, to make failovers smoother, and the facility to stream a base-backup over the wire. I wish someone worked on those... So, we've been talking about base backup streaming at conferences and we have a working prototype. We even have a needed piece of it in core now, that's the pg_read_binary_file() function. What we still miss is an overall design and some integration effort. Let's design first. I propose the following new pg_ctl command to initiate the cloning: pg_ctl clone [-D datadir] [-s on|off] [-t filename] primary_conninfo As far as user are concerned, that would be the only novelty. Once that command is finished (successfully) they would edit postgresql.conf and start the service as usual. A basic recovery.conf file is created with the given options, standby_mode is driven by -s and defaults to off, and trigger_file defaults to being omitted and is given by -t. Of course the primary_conninfo given on the command line is what ends up into the recovery.conf file. That alone would allow for making base backups for recovery purposes and for standby preparing. To support for this new tool, the simplest would be to just copy what I've been doing in the prototype, that is run a query to get the primary file listing (per tablespace, not done in the prototype) then get their bytea content over the wire. That means there's no further backend support code to write. https://github.com/dimitri/pg_basebackup We could prefer to have a backend function prepare a tar archive and stream it using the COPY protocol, with some compression support, but that's more complex to code now and to parallelize down the road. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] management of large patches
(2011/01/02 14:32), Robert Haas wrote: We're coming the end of the 9.1 development cycle, and I think that there is a serious danger of insufficient bandwidth to handle the large patches we have outstanding. For my part, I am hoping to find the bandwidth to two, MAYBE three major commits between now and the end of 9.1CF4, but I am not positive that I will be able to find even that much time, and the number of major patches vying for attention is considerably greater than that. Quick estimate: : - SE-Linux integration How about feasibility to commit this 3KL patch in the last 45 days? At least, the idea of security provider enables us to maintain a set of hooks and logic to make access control decision independently. I'm available to provide a set of sources for this module at git.postgresql.org, so we can always obtain a working module from here. The worst scenario for us is nothing were progressed in spite of large man-power to review and discuss. It may be more productive to keep features to be committed on the last CF as small as possible, such as hooks to support a part of DDL permissions or pg_regress enhancement to run regression test. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #5662: Incomplete view
On mån, 2010-12-06 at 14:47 +0200, Peter Eisentraut wrote: On sön, 2010-09-19 at 14:28 -0400, Tom Lane wrote: Or maybe we could implement that function, call it like this CAST((pg_sequence_parameters(c.oid)).max_value AS cardinal_number) AS maximum_value, and plan on optimizing the view when we get LATERAL. Here is an implementation of that. Committed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote: On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs si...@2ndquadrant.com wrote: Yes, working out the math is a good idea. Things are much clearer if we do that. Let's assume we have 98% availability on any single server. 1. Having one primary and 2 standbys, either of which can acknowledge, and we never lock up if both standbys fail, then we will have 99.9992% server availability. (So PostgreSQL hits 5 Nines, with data guarantees). (Maximised availability) I don't agree with this math. ...(snip by Simon)... 99.96%. OK, so that is at least 99.96%. Cool. The key point here is not (1), but option (4). The approach advocated by Heikki and yourself gives us 94% availability. IMHO that is ridiculous, and I will not accept that as the *only* way forwards, for that reason, whoever advocates it or for how long they keep arguing. I do accept that some wish that as an option. If we are to have a sensible technical debate with an eventual end, you must answer the points placed in front of you, not just sidestep and try to point out problems somewhere else. All analysis must be applied to all options, not just those options advocated by someone else. I've been asking for a failure mode analysis for months and it never comes in full. I'm more than happy to discuss your additional points once we are clear on the 94% because it is pivotal to everything I've been proposing. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI SLRU low-level functions first cut
Heikki Linnakangas wrote: Nothing checking for the hi-bit flag AFAICS. I guess the code that uses that would do check it. Right. After getting this layer done, I went off to watch the Badgers in the Rose Bowl, leaving that coding for today. ;-) But wouldn't it be simpler to mark the unused slots with zero commitseqno, instead of messing with the hi-bit in valid values? This is the earliest commitSeqNo of rw-conflicts out which the transaction we're looking up had. I'm using zero to mean that there was no conflict. Perhaps instead of setting the high bit I could just use a special value (like all bits set) instead of zero to mean no conflict. In any event, it's clear that all zero should mean not found and I need some other way to indicate no conflict. It's probably not necessary to explicitly truncate the slru at startup. We don't do that for pg_subtrans, which also doesn't survive restarts. The next checkpoint will truncate it. Good point. That slims things down by 22 lines and eliminates a distracting special case. It would possibly be simpler to not reset headXid and tailXid to InvalidTransactionId when the window is empty, but represent that as tailXid == headXid + 1. I'll take a look. I went 'round a few time on how best to handle the empty window, which was complicated a little bit by wanting to keep track of the tail even when the window was currently empty. Because xids won't be submitted in strictly sequential order, I might need to go back a ways in the sequence to update something, so I need to keep track of existing segment files even when there are currently no xids to track; and I wanted the searches to have a fast path out for such cases. OldSerXidGetMinConflictCommitSeqNo() calls LWLockRelease twice. That's because the function calls SimpleLruReadPage_ReadOnly: Control lock must NOT be held at entry, but will be held at exit. That strikes me as an odd API, but it is what it is. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to know killed by pg_terminate_backend
Seems reasonable. Does the victim backend currently know why it has been killed? I don't think so. One idea is postmaster sets a flag in the shared memory area indicating it rceived SIGTERM before forwarding the signal to backends. Backend check the flag and if it's not set, it knows that the signal has been sent by pg_terminate_backend(), not postmaster. Or it could also be sent by some other user process, like the user running kill from the shell. No problem (at least for pgpool-II). If the flag is not set, postgres returns the same code as the one killed by pg_terminate_backend(). The point is, backend is killed by postmaster or not. Because if backend was killed by postmaster, pgpool-II should not expect the PostgreSQL server is usable since postmaster decided to shutdown. Here is the patch to implement the feature. 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to the target backend. 2) The infrastructure used for message passing is storage/ipc/procsignal.c The new message type for ProcSignalReason is PROCSIG_TERMNINATE_BACKEND_INTERRUPT 3) I assign new error code 57P04 which is returned from the backend killed by pg_terminate_backend(). #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') Comments are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *** *** 279,284 procsignal_sigusr1_handler(SIGNAL_ARGS) --- 279,287 if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT)) + HandleTerminateBackendInterrupt(); + latch_sigusr1_handler(); errno = save_errno; *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *** *** 184,189 static bool RecoveryConflictPending = false; --- 184,195 static bool RecoveryConflictRetryable = true; static ProcSignalReason RecoveryConflictReason; + /* + * True if backend is being killed by pg_terminate_backend(). + * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1. + */ + static bool TerminateBackendRequest = false; + /* * decls for routines only used in this file * *** *** 2875,2880 RecoveryConflictInterrupt(ProcSignalReason reason) --- 2881,2924 } /* + * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend + * handling following receipt of SIGUSR1. Designed to be similar to die(). + * Called only by a normal user backend. + */ + void + HandleTerminateBackendInterrupt(void) + { + int save_errno = errno; + + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) + { + InterruptPending = true; + ProcDiePending = true; + TerminateBackendRequest = true; + + /* + * If it's safe to interrupt, and we're waiting for input or a lock, + * service the interrupt immediately + */ + if (ImmediateInterruptOK InterruptHoldoffCount == 0 + CritSectionCount == 0) + { + /* bump holdoff count to make ProcessInterrupts() a no-op */ + /* until we are done getting ready for it */ + InterruptHoldoffCount++; + LockWaitCancel(); /* prevent CheckDeadLock from running */ + DisableNotifyInterrupt(); + DisableCatchupInterrupt(); + InterruptHoldoffCount--; + ProcessInterrupts(); + } + } + + errno = save_errno; + } + + /* * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro * * If an interrupt condition is pending, and it's safe to service it, *** *** 2912,2917 ProcessInterrupts(void) --- 2956,2966 (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg(terminating connection due to conflict with recovery), errdetail_recovery_conflict())); + else if (TerminateBackendRequest) + ereport(FATAL, + (errcode(ERRCODE_TERMINATE_BACKEND), + errmsg(terminating connection due to pg_terminate_backend))); + else ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), *** a/src/backend/utils/adt/misc.c --- b/src/backend/utils/adt/misc.c *** *** 114,120 pg_cancel_backend(PG_FUNCTION_ARGS) Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)); } Datum --- 114,122 Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL( ! SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT, ! InvalidBackendId) == 0); } Datum *** a/src/include/storage/procsignal.h --- b/src/include/storage/procsignal.h *** *** 40,45 typedef
Re: [HACKERS] How to know killed by pg_terminate_backend
Seems reasonable. Does the victim backend currently know why it has been killed? I don't think so. One idea is postmaster sets a flag in the shared memory area indicating it rceived SIGTERM before forwarding the signal to backends. Backend check the flag and if it's not set, it knows that the signal has been sent by pg_terminate_backend(), not postmaster. Or it could also be sent by some other user process, like the user running kill from the shell. No problem (at least for pgpool-II). If the flag is not set, postgres returns the same code as the one killed by pg_terminate_backend(). The point is, backend is killed by postmaster or not. Because if backend was killed by postmaster, pgpool-II should not expect the PostgreSQL server is usable since postmaster decided to shutdown. Here is the patch to implement the feature. 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to the target backend. 2) The infrastructure used for message passing is storage/ipc/procsignal.c The new message type for ProcSignalReason is PROCSIG_TERMNINATE_BACKEND_INTERRUPT 3) I assign new error code 57P04 which is returned from the backend killed by pg_terminate_backend(). #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 'P','0','4') Comments are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp *** a/src/backend/storage/ipc/procsignal.c --- b/src/backend/storage/ipc/procsignal.c *** *** 279,284 procsignal_sigusr1_handler(SIGNAL_ARGS) --- 279,287 if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); + if (CheckProcSignal(PROCSIG_TERMNINATE_BACKEND_INTERRUPT)) + HandleTerminateBackendInterrupt(); + latch_sigusr1_handler(); errno = save_errno; *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *** *** 184,189 static bool RecoveryConflictPending = false; --- 184,195 static bool RecoveryConflictRetryable = true; static ProcSignalReason RecoveryConflictReason; + /* + * True if backend is being killed by pg_terminate_backend(). + * Set by HandleTerminateBackendInterrupt() upon received SIGUSR1. + */ + static bool TerminateBackendRequest = false; + /* * decls for routines only used in this file * *** *** 2875,2880 RecoveryConflictInterrupt(ProcSignalReason reason) --- 2881,2924 } /* + * HandleTerminateBackendInterrupt: out-of-line portion of terminate backend + * handling following receipt of SIGUSR1. Designed to be similar to die(). + * Called only by a normal user backend. + */ + void + HandleTerminateBackendInterrupt(void) + { + int save_errno = errno; + + /* Don't joggle the elbow of proc_exit */ + if (!proc_exit_inprogress) + { + InterruptPending = true; + ProcDiePending = true; + TerminateBackendRequest = true; + + /* + * If it's safe to interrupt, and we're waiting for input or a lock, + * service the interrupt immediately + */ + if (ImmediateInterruptOK InterruptHoldoffCount == 0 + CritSectionCount == 0) + { + /* bump holdoff count to make ProcessInterrupts() a no-op */ + /* until we are done getting ready for it */ + InterruptHoldoffCount++; + LockWaitCancel(); /* prevent CheckDeadLock from running */ + DisableNotifyInterrupt(); + DisableCatchupInterrupt(); + InterruptHoldoffCount--; + ProcessInterrupts(); + } + } + + errno = save_errno; + } + + /* * ProcessInterrupts: out-of-line portion of CHECK_FOR_INTERRUPTS() macro * * If an interrupt condition is pending, and it's safe to service it, *** *** 2912,2917 ProcessInterrupts(void) --- 2956,2966 (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg(terminating connection due to conflict with recovery), errdetail_recovery_conflict())); + else if (TerminateBackendRequest) + ereport(FATAL, + (errcode(ERRCODE_TERMINATE_BACKEND), + errmsg(terminating connection due to pg_terminate_backend))); + else ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), *** a/src/backend/utils/adt/misc.c --- b/src/backend/utils/adt/misc.c *** *** 114,120 pg_cancel_backend(PG_FUNCTION_ARGS) Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL(pg_signal_backend(PG_GETARG_INT32(0), SIGTERM)); } Datum --- 114,122 Datum pg_terminate_backend(PG_FUNCTION_ARGS) { ! PG_RETURN_BOOL( ! SendProcSignal(PG_GETARG_INT32(0), PROCSIG_TERMNINATE_BACKEND_INTERRUPT, ! InvalidBackendId) == 0); } Datum *** a/src/include/storage/procsignal.h --- b/src/include/storage/procsignal.h *** *** 40,45 typedef enum ---
Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility
On tis, 2010-12-28 at 13:13 +0100, Magnus Hagander wrote: My pg_streamrecv no longer works with 9.1, because it returns PGRES_COPY_BOTH instead of PGRES_COPY_OUT when initating a copy. That's fine. So I'd like to make it work on both. Specifically, I would like it to check for PGRES_COPY_BOTH if the server is 9.1 and PGRES_COPY_OUT if it's 9.0. Which can be done by checking the server version. ISTM that the correct fix is to increment to protocol version number to 3.1 and send PGRES_COPY_OUT if the client requests version 3.0. That's what the version numbers are for, no? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
Simon Riggs wrote: On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote: On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggs wrote: Yes, working out the math is a good idea. Things are much clearer if we do that. Let's assume we have 98% availability on any single server. 1. Having one primary and 2 standbys, either of which can acknowledge, and we never lock up if both standbys fail, then we will have 99.9992% server availability. (So PostgreSQL hits 5 Nines, with data guarantees). (Maximised availability) I don't agree with this math. ...(snip by Simon)... 99.96%. OK, so that is at least 99.96%. Cool. I think you're talking about different metrics, and you're both right. With two servers configured in sync rep your chance of having an available (running) server is 99.9992%. The chance that you know that you have one that is totally up to date, with no lost transactions is 99.9208%. The chance that you *actually* have up-to-date data would be higher, but you'd have no way to be sure. The 99.96% number is your certainty that you have a running server with up-to-date data if only one machine is sync rep. It's a matter of whether your shop needs five nines of availability or the highest probability of not losing data. You get to choose. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 08:08 -0600, Kevin Grittner wrote: I think you're talking about different metrics, and you're both right. With two servers configured in sync rep your chance of having an available (running) server is 99.9992%. The chance that you know that you have one that is totally up to date, with no lost transactions is 99.9208%. The chance that you *actually* have up-to-date data would be higher, but you'd have no way to be sure. The 99.96% number is your certainty that you have a running server with up-to-date data if only one machine is sync rep. It's a matter of whether your shop needs five nines of availability or the highest probability of not losing data. You get to choose. Thanks for those calculations. Do you agree that requiring response from 2 sync standbys, or locking up, gives us 94% server availability, but 99.9992% data durability? And that adding additional async servers would not increase the server availability of that cluster? Now lets look at what happens when we first start a standby: we do the base backup, configure the standby, it connects and then wham we cannot process any new transactions until the standby has caught up, which could well be hours on a big database. So if we don't have a processing mode that allows work to continue, how will we ever enable synchronous replication on a 24/7 database? How will we ever allow standbys to catch up if they drop out for a while? We should factor that into the availability calcs as well. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Base Backup Streaming
On 02.01.2011 14:47, Dimitri Fontaine wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: BTW, there's a bunch of replication related stuff that we should work to close, that are IMHO more important than synchronous replication. Like making the standby follow timeline changes, to make failovers smoother, and the facility to stream a base-backup over the wire. I wish someone worked on those... So, we've been talking about base backup streaming at conferences and we have a working prototype. We even have a needed piece of it in core now, that's the pg_read_binary_file() function. What we still miss is an overall design and some integration effort. Let's design first. We even have a rudimentary patch to add the required backend support: http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com That just needs to be polished into shape, and documentation. I propose the following new pg_ctl command to initiate the cloning: pg_ctl clone [-D datadir] [-s on|off] [-t filename] primary_conninfo As far as user are concerned, that would be the only novelty. Once that command is finished (successfully) they would edit postgresql.conf and start the service as usual. A basic recovery.conf file is created with the given options, standby_mode is driven by -s and defaults to off, and trigger_file defaults to being omitted and is given by -t. Of course the primary_conninfo given on the command line is what ends up into the recovery.conf file. That alone would allow for making base backups for recovery purposes and for standby preparing. +1. Or maybe it would be better make it a separate binary, rather than part of pg_ctl. To support for this new tool, the simplest would be to just copy what I've been doing in the prototype, that is run a query to get the primary file listing (per tablespace, not done in the prototype) then get their bytea content over the wire. That means there's no further backend support code to write. It would be so much nicer to have something more integrated, like the patch I linked above. Running queries requires connecting to a real database, which means that the user needs to have privileges to do that and you need to know the name of a valid database. Ideally this would all work through a replication connection. I think we should go with that from day one. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to know killed by pg_terminate_backend
Tatsuo Ishii is...@postgresql.org writes: Comments are welcome. This is a bad idea. It makes an already-poorly-tested code path significantly more fragile, in return for nothing of value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 02.01.2011 15:41, Simon Riggs wrote: On Sat, 2011-01-01 at 23:36 -0500, Robert Haas wrote: On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com wrote: Yes, working out the math is a good idea. Things are much clearer if we do that. Let's assume we have 98% availability on any single server. 1. Having one primary and 2 standbys, either of which can acknowledge, and we never lock up if both standbys fail, then we will have 99.9992% server availability. (So PostgreSQL hits 5 Nines, with data guarantees). (Maximised availability) I don't agree with this math. ...(snip by Simon)... 99.96%. OK, so that is at least 99.96%. Cool. The key point here is not (1), but option (4). The approach advocated by Heikki and yourself gives us 94% availability. IMHO that is ridiculous, and I will not accept that as the *only* way forwards, for that reason, whoever advocates it or for how long they keep arguing. I do accept that some wish that as an option. No-one is suggesting that to be the only option. The wait-for-all-to-ack looks a lot less ridiculous if you also configure a timeout and don't wait for disconnected standbys. I'm not sure what the point of such a timeout in general is, but people have requested that. Also, setting synchronous_standbys=room1, room2 doesn't necessarily mean that you have just two standby servers, room1 and room2 might both represent a group of servers. I believe we all agree that there's different use cases that require different setups. Both first-past-the-post and wait-for-all-to-ack have their uses. There's no point in arguing over which is better. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
Simon Riggs wrote: Do you agree that requiring response from 2 sync standbys, or locking up, gives us 94% server availability, but 99.9992% data durability? I'm not sure how to answer that. The calculations so far have been based around up-time and the probabilities that you have a machine up at any moment and whether you can have confidence that if you do, you have all committed transactions represented. There's been an implied assumption that the down time is unplanned, but not much else. The above question seems to me to get into too many implied assumptions to feel safe throwing out a number without pinning those down a whole lot better. If, for example, that 2% downtime always means the machine irretrievably went up in smoke, hitting unavailable means things are unrecoverable. That's probably not the best assumption (at least outside of a combat zone), but what is? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 11:11 -0600, Kevin Grittner wrote: Simon Riggs wrote: Do you agree that requiring response from 2 sync standbys, or locking up, gives us 94% server availability, but 99.9992% data durability? I'm not sure how to answer that. The calculations so far have been based around up-time and the probabilities that you have a machine up at any moment and whether you can have confidence that if you do, you have all committed transactions represented. There's been an implied assumption that the down time is unplanned, but not much else. The above question seems to me to get into too many implied assumptions to feel safe throwing out a number without pinning those down a whole lot better. If, for example, that 2% downtime always means the machine irretrievably went up in smoke, hitting unavailable means things are unrecoverable. That's probably not the best assumption (at least outside of a combat zone), but what is? Not really relevant. There's no room at all for downtime of any kind in a situation where all servers must be available. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Base Backup Streaming
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com That just needs to be polished into shape, and documentation. Wow, cool! I don't know how but I've missed it. +1. Or maybe it would be better make it a separate binary, rather than part of pg_ctl. Well the thinking was that nowadays we support initdb from pg_ctl, and this is another kind of initdb, really. I linked above. Running queries requires connecting to a real database, which means that the user needs to have privileges to do that and you need to know the name of a valid database. Ideally this would all work through a replication connection. I think we should go with that from day one. I didn't think about the connecting to a real database part of it, versus using a dedicated REPLICATION connection/protocol, and to be honest, I feared it was too much work. Seeing that you already did it, though, +1. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] management of large patches
Robert Haas wrote: - true serializability - not entirely sure of the status of this I try to keep the status section of the Wiki page up-to-date. I have just reviewed it and tweaked it for the latest events: http://wiki.postgresql.org/wiki/Serializable#Current_Status There are a number of pending RD issues: http://wiki.postgresql.org/wiki/Serializable#R.26D_Issues Most of these can be deferred. The ones which really need at least some attention before release relate to how to deal with serializable transactions on replication targets and whether we've been properly careful about using coding style which is safe for machines with weak memory ordering. I've done my best to follow discussions on that topic and do the right thing, but someone with a deeper understanding of the issues should probably take a look. Someone has joined the effort starting this weekend -- a consultant who has done a lot of technical writing (John Okite) will be working on doc changes related to the patch. (I assume that would best be submitted as a separate patch.) If you want a shorter version of the patch status: We expect to have updated patch before the CF, including docs and incorporating feedback from previous CFs and Heikki's comments on interim work. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Thu, Dec 30, 2010 at 9:02 PM, Robert Haas robertmh...@gmail.com wrote: reads MySQL documentation I see now that you've tried to design this feature in a way that is similar to MySQL's offering, which does have some value. But it appears to me that the documentation you've written here is substantially similar to the MySQL 5.5 reference documentation. That could get us into a world of legal trouble - that documentation is not even open source, let alone BSD. http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html The docs originate from work done by my former team at Google. The content license on this is CC 3.0 BY-SA, so I don't think that should be a concern. http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplication http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign From http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html) the MySQL docs don't mention that other transactions can view the committed data on the master between steps 1 and 2. Is that possible in this case? As described in the the MySQL docs, semi-sync has another benefit for some deployments. It rate limits busy clients to prevent them from creating replication lag between the primary and standby servers. I also provided the text for that (http://bugs.mysql.com/bug.php?id=57911) if you are concerned about copying. -- Mark Callaghan mdcal...@gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sat, Jan 1, 2011 at 8:35 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2011-01-01 at 05:13 -0800, Jeff Janes wrote: On 12/31/10, Simon Riggs si...@2ndquadrant.com wrote: 2. sync does not guarantee that the updates to the standbys are in any way coordinated. You can run a query on one standby and get one answer and at the exact same time run the same query on another standby and get a different answer (slightly ahead/behind). That also means that if the master crashes one of the servers will be ahead or behind. You can use pg_last_xlog_receive_location() to check which one that is. If at least one of the standbys is in the same smoking crater as the primary, then pg_last_xlog_receive_location on it is unlikely to respond. The guarantee goes away precisely when it is needed. Fairly obviously, I would not be advocating anything that forced you to use a server in the same smoking crater. You are forced to use the standby which is further ahead, otherwise you might lose transactions which have been reported to have been committed. The mere existence of a commit-releasing stand-by in the same data center as the primary means that a remote standby is not very useful for data preservation after campus-wide disasters. It is probably behind (due to higher latency) and even if it is not behind, there is no way to *know* that is not behind if the on-site standby cannot be contacted. I understand that you are not advocating the use of one local standby and one remote standby, both synchronous. But I think we need to *explicitly* warn against it. After all, the docs do explicitly recommend the use of two standbys. If we assume that the readers are already experts, then they don't need that advice. If they are not experts, then that advice could lead them to shoot themselves in the foot, both kneecaps, and a femur (metaphorically speaking, unlike the smoking crater, which is a literal scenario some people need to plan for). If durability is more important than availability, what would you recommend? Only one synchronous rep, in a remote data center? Two (or more) synchronous reps all in the same remote data center? In two different remote data centers? I can't see any guarantee that goes away precisely when it is needed. In order to know that you are not losing data, you have to be able to contact every single semi-synchronous standby and invoke pg_last_xlog_receive_location on it. If your goal is to have data durability protected from major catastrophes (and why else would you do synchronous rep to remote data centers?), then it is expecting a lot to have every single standby survive that major catastrophe. That expectation is an unavoidable consequence of going with single-confirmation-releases. Perhaps you think this consequence is too obvious to document--if so I disagree on that. Perhaps you could explain the issue you see, because your comments seem unrelated to my point above. It is directly related to the part of your point about using pg_last_xlog_receive_location. When planning for disaster recovery, it is little comfort that you can do something in a non-disaster case, if you can't also do it in likely disaster cases. It probably wasn't relevant to the first part of your point, but I must admit I did not understand the first part of your point. Obviously they are coordinated in *some* way (I believe commits occur in the same order on each server, for example). Different read-only standbys could give different results, but only from among the universe of results made possible by a given commit sequence. But that is not the part I had intended to comment on, and I don't think it is what other people concerned about durability after major catastrophes were focusing on, either. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] managment of large patches
Hello. Maybe are any often bugs? they may be found by more asserts to track internal state of structures. Or tools like lastly developed script for c++ keywords. -- Sent from my mobile device pasman -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 12:13 -0800, MARK CALLAGHAN wrote: On Thu, Dec 30, 2010 at 9:02 PM, Robert Haas robertmh...@gmail.com wrote: reads MySQL documentation I see now that you've tried to design this feature in a way that is similar to MySQL's offering, which does have some value. But it appears to me that the documentation you've written here is substantially similar to the MySQL 5.5 reference documentation. That could get us into a world of legal trouble - that documentation is not even open source, let alone BSD. http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html Sorry, only just read that bit. Are they that similar? My docs are about 3 times longer and cover all sorts of things. I didn't intentionally copy anything, but that doesn't really matter, what matters is that if you think they are similar, legal people might. I've only read the URL above, not the other links from it. Robert, Can you identify which paragraphs need to be re-written? I won't argue, I will just rewrite them or delete them and start afresh. Thanks for being eagle-eyed. The docs originate from work done by my former team at Google. The content license on this is CC 3.0 BY-SA, so I don't think that should be a concern. http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplication http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign I guess that gets us off the hook a little bit, but not far enough for my liking. Thanks for trying to save me! From http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html) the MySQL docs don't mention that other transactions can view the committed data on the master between steps 1 and 2. Is that possible in this case? Other transactions on the master cannot read data until after the confirmation it is on the sync standby. As described in the the MySQL docs, semi-sync has another benefit for some deployments. It rate limits busy clients to prevent them from creating replication lag between the primary and standby servers. I also provided the text for that (http://bugs.mysql.com/bug.php?id=57911) if you are concerned about copying. Yeh, I'm aware of the effect, but I'm not really seeing slowing down the master as a benefit, its more an implication of synchronicity. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Base Backup Streaming
On Sun, Jan 2, 2011 at 18:53, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: http://archives.postgresql.org/message-id/4c80d9b8.2020...@enterprisedb.com That just needs to be polished into shape, and documentation. I have an updated version of this somewhere.IIRC it also needs things like tablespace support, ubt it's well on it's way. Wow, cool! I don't know how but I've missed it. Yes, especially since we discussed it in Stuttgart. I guess it may have been during the party... +1. Or maybe it would be better make it a separate binary, rather than part of pg_ctl. Well the thinking was that nowadays we support initdb from pg_ctl, and this is another kind of initdb, really. Yes, if it should go in any of the current binaries, initdb would be the reasonable place, not pg_ctl ;) That said, if we're going to wrap pg_streamrecv into 9.1 (which I think we should), then *that* is where it should go. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On Sun, 2011-01-02 at 18:54 +0200, Heikki Linnakangas wrote: I believe we all agree that there's different use cases that require different setups. Both first-past-the-post and wait-for-all-to-ack have their uses. Robert's analysis is that first-past-the-post doesn't actually improve the durability guarantee (according to his calcs). Which means that 1 primary, 2 sync standbys with first-past-the-post is actually worse than 1 primary, 1 sync and 1 async standby in terms of its durability guarantees. So ISTM that Robert does not agree that both have their uses. I'm not sure what the point of such a timeout in general is, but people have requested that. Again, this sounds like you think a timeout has no measurable benefit, other than to please some people's perceived needs. The wait-for-all-to-ack looks a lot less ridiculous if you also configure a timeout and don't wait for disconnected standbys Does it? Do Robert, Stefan and Aidan agree? What are the availability and durability percentages if we do that? Based on those, we may decide to do that instead. But I'd like to see some analysis of your ideas, not just a we could. Since nobody has commented on my analysis, lets see someone else's. There's no point in arguing over which is better. I'm trying to compare quantifiable benefit of various options to see what goes into Postgres. I don't want to put anything in that we cannot all agree has a measurable benefit to someone (that has the appropriate preference). -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Base Backup Streaming
Magnus Hagander mag...@hagander.net writes: Yes, especially since we discussed it in Stuttgart. I guess it may have been during the party... I remember we talked about it, I didn't remember a patch had reached the list… Yes, if it should go in any of the current binaries, initdb would be the reasonable place, not pg_ctl ;) Well, pg_ctl is able to call initdb for users, but yes. That said, if we're going to wrap pg_streamrecv into 9.1 (which I think we should), then *that* is where it should go. They should at least cooperate so that you don't need to setup WAL archiving explicitly while preparing the standby, if at least possible. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] and it's not a bunny rabbit, either
On lör, 2011-01-01 at 17:21 -0500, Robert Haas wrote: I don't see anything wrong with having 20 or 30 messages of variants of foo cannot be used on bar without placeholders. Well, that's OK with me. It seems a little grotty, but manageably so. Questions: 1. Should we try to include the name of the object? If so, how? Hmm. There is a bit of a difference in my mind between, say, constraints cannot be used on sequences constraint foo cannot be used on sequence bar the latter leaving open the question whether some other combination might work. 2. Can we have a variant with an SQL-command-fragment parameter? %s cannot be used on views where %s might be CLUSTER, DROP COLUMN, etc. That's OK; we do that in several other places. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/snapshot
On Dec 31, 2010, at 1:35 PM, Joel Jacobson wrote: 2010/12/31 Simon Riggs si...@2ndquadrant.com Please call it something other than snapshot. There's already about 3 tools called something similar and a couple of different meanings of the term in the world of Postgres. Thanks, good point. Renamed to fsnapshot. Is it actually limited to functions? ISTM this concept would be valuable for anything that's not in pg_class (in other words, anything that doesn't have user data in it). Also, I'm not sure why this needs to be in contrib vs pgFoundry. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C++ keywords in headers (was Re: [GENERAL] #include funcapi.h)
I believe that Dave Page wants to move to building pg for windows using visual C++ 2010 some time this year. That alone may be enough of a reason to check for C++0x keywords in headers: http://blogs.msdn.com/b/vcblog/archive/2010/04/06/c-0x-core-language-features-in-vc10-the-table.aspx I think that there is likely to be an expectation that the same compiler that is used to build pg should be able to include pg headers in C++ TUs. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How to know killed by pg_terminate_backend
Tatsuo Ishii is...@postgresql.org writes: Comments are welcome. This is a bad idea. It makes an already-poorly-tested code path significantly more fragile, in return for nothing of value. Are you saying that procsignal.c is the already-poorly-tested one? If so, why? As for value, I have already explained why we need this in the upthread. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery conflict monitoring
On Mon, 2010-12-27 at 14:39 +0100, Magnus Hagander wrote: On Thu, Dec 23, 2010 at 13:09, Magnus Hagander mag...@hagander.net wrote: This patch adds counters and views to monitor hot standby generated recovery conflicts. It extends the pg_stat_database view with one column with the total number of conflicts, and also creates a new view pg_stat_database_conflicts that contains a breakdown of exactly what caused the conflicts. Documentation still pending, but comments meanwhile is of course appreciated ;) Heikki pointed out over IM that it's pointless to count stats caused by recovery conflict with drop database - since we drop the stats record as soon as it arrives anyway. Here's an updated patch that removes that, and also adds some documentation. I like the patch, well inspired, code in the right places AFAICS. No code comments at all. Couple of thoughts: * are we safe to issue stats immediately before issuing FATAL? Won't some of them get lost? * Not clear what I'd do with database level information, except worry a lot. Maybe an option to count conflicts per user would be better, since at least we'd know exactly who was affected by those. Just an idea. * Would it better to have a log_standby_conflicts that allowed the opportunity to log the conflicting SQL, duration until cancelation etc? I'd rather have what you have than nothing at all though... the new hot_standby_feedback mode should be acting to reduce these, so it would be useful to have this patch enabled for testing that feature. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] page compression
On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote: I know its been discussed before, and one big problem is license and patent problems. Would like to see a design for that. There's a few different ways we might want to do that, and I'm interested to see if its possible to get compressed pages to be indexable as well. For example, if you compress 2 pages into 8Kb then you do one I/O and out pops 2 buffers. That would work nicely with ring buffers. Or you might try to have pages 8Kb in one block, which would mean decompressing every time you access the page. That wouldn't be much of a problem if we were just seq scanning. Or you might want to compress the whole table at once, so it can only be read by seq scan. Efficient, but not indexes. It would be interesting to explore pre-populating the compression dictionary with some common patterns. Anyway, interesting topic. -- Simon Riggs http://www.2ndQuadrant.com/books/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contrib/snapshot
2011/1/2 Jim Nasby j...@nasby.net Renamed to fsnapshot. Is it actually limited to functions? ISTM this concept would be valuable for anything that's not in pg_class (in other words, anything that doesn't have user data in it). My ambition is to primarily support functions. Support for other object types are merely a necessary side-effect of the function dependencies. Is there a matrix of all possible object types dependencies? If not, for functions, is the following list correct? Object types which may depend on functions: constraints, views, triggers, any more? Functions may depend on: language, any more? Instead of limiting the support to functions, perhaps it would make more sense to limit it to all non-data objects? Is there a term for the group of object types not carrying any user data? Which object types do carry user data? I can only think of tables and sequences, any other? Also, I'm not sure why this needs to be in contrib vs pgFoundry. Good point. It's actually in neither of them right now, it's only at github.com :) I merely used the prefix contrib/ in the subject line to indicate it's not a patch to the core. I do hope though it's possible to get a place for it in contrib/ at some time in the future, I think there is a chance quite a lot of users would appreciate a quicker, less error-prone way of handling these things. This tool must be made extremely reliable, otherwise you won't feel safe using it in a production environment for deployment and revert purposes, which is my company's requirement. I hope to achieve this by keeping a bare minimum approach to features, and making sure it only fulfills the objective: 1. take a snapshot of all non-data objects 2. deploy code, test new code, or let time pass while other people make a mess in your database 3. revert to previous snapshot without affecting any of the new data, generated in step 2 I put my faith in the reliability on system functions, such as pg_get_functiondef(), pg_get_viewdef() etc, to build proper create/drop commands for each object. Even nicer would be if the pg_catalog provided functions to generate SQL create/drop commands for all non-data object types, and to make sure _everything_ is included in the command, ensuring the object is created exactly the same, currently pg_get_functiondef() does not restore the ownership of the function, which I had to append manually. -- Best regards, Joel Jacobson Glue Finance
Re: [HACKERS] contrib/snapshot
2011/1/3 Joel Jacobson j...@gluefinance.com 2011/1/2 Jim Nasby j...@nasby.net Is it actually limited to functions? ISTM this concept would be valuable for anything that's not in pg_class (in other words, anything that doesn't have user data in it). Instead of limiting the support to functions, perhaps it would make more sense to limit it to all non-data objects? Is there a term for the group of object types not carrying any user data? My bad, I see you already answered both my questions. So, it does make sense, and the term for non-data object types is therefore non-pg_class, non-class or perhaps non-relation objects? -- Best regards, Joel Jacobson Glue Finance
Re: [HACKERS] contrib/snapshot
On 01/02/2011 07:44 PM, Joel Jacobson wrote: Also, I'm not sure why this needs to be in contrib vs pgFoundry. Good point. It's actually in neither of them right now, it's only at github.com http://github.com :) I merely used the prefix contrib/ in the subject line to indicate it's not a patch to the core. contrib in PostgreSQL means a module maintained by the backend developers. But it's not clear to me that there is any particular reason why this should be in contrib. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sync Rep Design
On 2.1.2011 5:36, Robert Haas wrote: On Sat, Jan 1, 2011 at 6:54 AM, Simon Riggssi...@2ndquadrant.com wrote: Yes, working out the math is a good idea. Things are much clearer if we do that. Let's assume we have 98% availability on any single server. 1. Having one primary and 2 standbys, either of which can acknowledge, and we never lock up if both standbys fail, then we will have 99.9992% server availability. (So PostgreSQL hits 5 Nines, with data guarantees). (Maximised availability) I don't agree with this math. If the master and one standby fail simultaneously, the other standby is useless, because it may or may not be caught up with the master. You know that the last transaction acknowledged as committed by the master is on at least one of the two standbys, but you don't know which one, and so you can't safely promote the surviving standby. (If you are working in an environment where promoting the surviving standby when it's possibly not caught up is OK, then you don't need sync rep in the first place: you can just run async rep and get much better performance.) So the availability is 98% (you are up when the master is up) + 98%^2 * 2% (you are up when both slaves are up and the master is down) = 99.92%. If you had only a single standby, then you could be certain that any commit acknowledged by the master was on that standby. Thus your availability would be 98% (up when master is up) + 98% * 2% (you are up when the master is down and the slave is up) = 99.96%. OTOH, in the case where you need _all_ the slaves to confirm any failing slave brings the master down, so adding a slave brings down availability by extra 2% The solution to achieving good durability AND availability is requiring N past the post instead of 1 past the post. In this case you can get to 99.9992% availability with master + 3 sync slaves, 2 of which have ACK. --- Hannu Krosing Performance and Infinite Scalability Consultant http://www.2ndQuadrant.com/books/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: new patch of MERGE (merge_204) a question about duplicated ctid
Marko Tiikkaja wrote: I'm confused. Are you saying that the patch is supposed to lock the table against concurrent INSERT/UPDATE/DELETE/MERGE? Because I don't see it in the patch, and the symptoms you're having are a clear indication of the fact that it's not happening. I also seem to recall that people thought locking the table would be excessive. That's exactly what it should be doing. I thought I'd seen just that in one of the versions of this patch, but maybe that's a mistaken memory on my part. In advance of the planned but not available yet ability to lock individual index key values, locking the whole table is the only possible implementation that can work correctly here I'm aware of. In earlier versions, I think this code was running into issues before it even got to there. If you're right that things like the duplicate key error in the current version are caused exclusively by not locking enough, that may be the next necessary step here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Supportwww.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Visual Studio 2010/Windows SDK 7.1 support
Hi, i'v created a patch enables support for building PostgreSQL with Visual Studio 2010 or Microsoft Windows SDK for Windows 7 and .NET Framework 4 (Windows SDK 7.1). You can grab it from http://www.piening.info/VS2010.patch It only touches the .pl, .pm and .bat files in src/tools/msvc so it's relevant for Windows only (that's why i've left crlf line endings - is that actually ok or should I have converted them?). It's diffed against current head + running perltidy -b -bl -nsfs -naws -l=100 -ole=unix *.pl *.pm as described in the README file (which seems not to have been run before committing Mkvcbuild.pm the last time). It is problably neither the perfect way to introduce VS2010 support (my perl is better than my C but probably still not what you are used to) nor is it my way to try to make you officially support VS 2010. But perhaps it's something you could start with once you decide to upgrade the msvc toolchain. The patch is necessary because M$ got rid of vcbuild in favour of msbuild which uses a different build file format (*.vcxproj). It should support all use cases described in http://www.postgresql.org/docs/current/static/install-windows-full.html and builds in Windows SDK 7.0 (VS 2008 toolchain) x86 and x64 as well as Windows SDK 7.1 (VS 2010 toolchain) x86 and x64. The SDK 7.1 build produces tons of warnings which are mostly macro redefinitions of EIDRM, EMSGSIZE, EAFNOSUPPORT, EWOULDBLOCK, ECONNRESET, EINPROGRESS, ENOBUFS, EPROTONOSUPPORT, ECONNREFUSED and EOPNOTSUPP which seem to have found their way into errno.h finally. Cutting those out of src\include\pg_config_os.h and src\interfaces\libpq\win32.h makes the project build pretty clean. I resisted the temptation to parse them out of those files during Mkvcbuild::mkvcbuild as this should probably be handled by some preprocessor defines. The build result passes vcregress check. The pgsql.sln file also opens and builds in VS 2010. I hope it is of some use. Best regards, Brar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers