Re: [HACKERS] Speed dblink using alternate libpq tuple storage
This is the new version of the patch. It is not rebased to the HEAD because of a build error. It's better to restore old two-path error handling. I restorerd OOM and save result route. But it seems needed to get back any amount of memory on REAL OOM as the comment in original code says. So I restored the meaning of rp == 0 errMsg == NULL as REAL OOM which is to throw the async result away and the result will be preserved if errMsg is not NULL. `unknown error' has been removed. As the result, if row processor returns 0 the parser skips to the end of rows and returns the working result or an error result according to whether errMsg is set or not in the row processor. I don't think that should be required. Just use a dummy msg. Considering the above, pqAddRow is also restored to leave errMsg NULL on OOM. There is still one EOF in v3 getAnotherTuple() - pqGetInt(tupnfields), please turn that one also to protocolerror. pqGetInt() returns EOF only when it wants additional reading from network if the parameter `bytes' is appropreate. Non-zero return from it seems should be handled as EOF, not a protocol error. The one point I had modified bugilly is also restored. The so-called 'protocol error' has been vanished eventually. Instead use (%s, errmsg) as argument there. libpq code is noisy enough, no need to add more. done Is there someting left undone? By the way, I noticed that dblink always says that the current connection is 'unnamed' in messages the errors in dblink_record_internal@dblink. I could see that dblink_record_internal defines the local variable conname = NULL and pass it to dblink_res_error to display the error message. But no assignment on it in the function. It seemed properly shown when I added the code to set conname from PG_GETARG_TEXT_PP(0) if available, in other words do that just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the dblink's manner... This is not included in this patch. Furthurmore dblink_res_error looks only into returned PGresult to display the error and always says only `Error occurred on dblink connection..: could not execute query'.. Is it right to consider this as follows? - dblink is wrong in error handling. A client of libpq should see PGconn by PQerrorMessage() if (or regardless of whether?) PGresult says nothing about error. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index 1af8df6..239edb8 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -160,3 +160,7 @@ PQconnectStartParams 157 PQping158 PQpingParams 159 PQlibVersion 160 +PQsetRowProcessor 161 +PQgetRowProcessor 162 +PQresultSetErrMsg 163 +PQskipResult 164 diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c index 27a9805..4605e49 100644 --- a/src/interfaces/libpq/fe-connect.c +++ b/src/interfaces/libpq/fe-connect.c @@ -2693,6 +2693,9 @@ makeEmptyPGconn(void) conn-wait_ssl_try = false; #endif + /* set default row processor */ + PQsetRowProcessor(conn, NULL, NULL); + /* * We try to send at least 8K at a time, which is the usual size of pipe * buffers on Unix systems. That way, when we are sending a large amount @@ -2711,8 +2714,13 @@ makeEmptyPGconn(void) initPQExpBuffer(conn-errorMessage); initPQExpBuffer(conn-workBuffer); + /* set up initial row buffer */ + conn-rowBufLen = 32; + conn-rowBuf = (PGrowValue *)malloc(conn-rowBufLen * sizeof(PGrowValue)); + if (conn-inBuffer == NULL || conn-outBuffer == NULL || + conn-rowBuf == NULL || PQExpBufferBroken(conn-errorMessage) || PQExpBufferBroken(conn-workBuffer)) { @@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn) free(conn-inBuffer); if (conn-outBuffer) free(conn-outBuffer); + if (conn-rowBuf) + free(conn-rowBuf); termPQExpBuffer(conn-errorMessage); termPQExpBuffer(conn-workBuffer); @@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler) return prev; } + diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index b743566..ce58778 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn); static int PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target); static int check_field_number(const PGresult *res, int field_num); +static int pqAddRow(PGresult *res, PGrowValue *columns, void *param); /* @@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn) if (conn-result) PQclear(conn-result); conn-result = NULL; - conn-curTuple = NULL; } /* @@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn) */ res = conn-result; conn-result = NULL; /* handing over ownership to caller */ - conn-curTuple = NULL; /* just in case */ if (!res) res = PQmakeEmptyPGresult(conn,
Re: [HACKERS] pg_basebackup -x stream from the standby gets stuck
On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 7, 2012 at 12:30, Fujii Masao masao.fu...@gmail.com wrote: Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication xlog start point: 2/AC4E2600 pg_basebackup: starting background WAL receiver 692447/692447 kB (100%), 1/1 tablespace xlog end point: 2/AC4E2600 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: base backup completed real 3m56.237s user 0m0.224s sys 0m0.936s (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when -x stream is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When -x stream is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with -x stream option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? This seems like a good thing in general. Why does it need to modify pg_receivexlog, though? I thought only pg_basebackup had tihs issue? I guess it is because of the change of the API to stream_continue_callback only? Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. But the reason why I changed segment_callback() in pg_receivexlog.c is not the same. I did that because previously segment_finish_callback is called only at the end of WAL segment but in the patch it can be called at the middle of segment. OTOH, segment_callback() must emit a verbose message only when current WAL segment is complete. So I had to add the check of whether current WAL segment is partial or complete into segment_callback(). Looking at it after your patch, stream_continue_callback and segment_finish_callback are the same. Should we perhaps just fold them into a single stream_continue_callback? Since you had to move the detect segment end to the caller anyway? No. I think we cannot do that because in pg_receivexlog they are not the same. Another question related to this - since we clearly don't need the xlog switch in this case, should we make it conditional on the master as well, so we don't switch unnecessarily there as well? Maybe. At the end of backup, we force WAL segment switch, to ensure all required WAL files have been archived. So theoretically if WAL archiving is not enabled, we can skip WAL segment switch. But some backup tools might depend on this behavior In standby-only backup, we always skip WAL segment switch. So there is no guarantee that all WAL files required for the backup are archived at the end of backup. This limitation is documented. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
(2012/02/27 12:35), Robert Haas wrote: On Sat, Feb 25, 2012 at 3:56 PM, Thom Brownt...@linux.com wrote: If there seems to be a consensus on removing system column from foreign tables, I'd like to work on this issue. Attached is a halfway patch, and ISTM there is no problem so far. I can say that at least PgAdmin doesn't use these columns. So we still have all of these columns for foreign tables. I've tested Hanada-san's patch and it removes all of the system columns. Could we consider applying it, or has a use-case for them since been discovered? Not to my knowledge, but Hanada-san described his patch as a halfway patch, implying that it wasn't done. Sorry for long absence. I've used the word halfway because I didn't have enough time to examine that patch at that time. I tested the patch, and now I think it's OK to apply. One concern is that there is no mention about unavailable system columns in any document. ddl.sgml has main description of system columns, but it just says: quote Every table has several system columns that are implicitly defined by the system. /quote Since this doesn't mention detailed type of relation, such as VIEW and COMPOSITE TYPE, IMO we can leave this paragraph as is. BTW, I still think that tableoid is useful if foreign tables can inherit other tables. With such feature, tableoid of foreign table is necessary to determine actual source table. Once we want to support that feature, IMO we should revive tableoid system column for foreign tables. -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Wed, Feb 8, 2012 at 1:33 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Feb 7, 2012 at 17:29, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 07.02.2012 16:55, Tom Lane wrote: (The integer vs float TimestampTz issue is a kind of portability problem, but we've already bought into the assumption that sender and receiver must be built with the same choice, no?) Hmm, true. In hindsight, I think that was a bad choice, but it's a bit late to change that. pg_basebackup doesn't otherwise care about the integer/float timestamps, but it does send a timestamp back to the server. You won't be able to actually start up the database if the config options don't match, but I think it would be good if pg_basebackup still worked across platforms and versions. For example, you might have a central backup server that calls pg_basebackup on several database servers, running on different platforms. In 9.0, the only field in the protocol that depends on timestamp format is WalDataMessageHeader-sendTime. That goes from server to client, and pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced StandbyReplyMessage-sendTime, which is sent from client to server, but looking at the code it looks like the server doesn't use it for anything. In 9.2, we added WalSndrMessage-sendTime, which is used by a standby server to calculate how far behind the standby is. I'm tempted to just change all of those TimestampTz fields to something that's independent of integer/float timestamp setting, in 9.2. At a quick glance, it seems that it wouldn't break anything. Agreed. If we'll have not pushed such change into 9.2, we would break something later. In general, I think that would work. Since we can't replicate across versions anyway. Will it break using pg_basebackup 9.2 on a 9.1 server, though? that would also be very useful in the scenario of the central server... No unless I'm missing something. Because pg_basebackup doesn't use any message which is defined in walprotocol.h if -x stream option is not specified. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
2012/2/28 Shigeru Hanada shigeru.han...@gmail.com: (2012/02/27 12:35), Robert Haas wrote: On Sat, Feb 25, 2012 at 3:56 PM, Thom Brownt...@linux.com wrote: If there seems to be a consensus on removing system column from foreign tables, I'd like to work on this issue. Attached is a halfway patch, and ISTM there is no problem so far. I can say that at least PgAdmin doesn't use these columns. So we still have all of these columns for foreign tables. I've tested Hanada-san's patch and it removes all of the system columns. Could we consider applying it, or has a use-case for them since been discovered? Not to my knowledge, but Hanada-san described his patch as a halfway patch, implying that it wasn't done. Sorry for long absence. I've used the word halfway because I didn't have enough time to examine that patch at that time. I tested the patch, and now I think it's OK to apply. One concern is that there is no mention about unavailable system columns in any document. ddl.sgml has main description of system columns, but it just says: quote Every table has several system columns that are implicitly defined by the system. /quote Since this doesn't mention detailed type of relation, such as VIEW and COMPOSITE TYPE, IMO we can leave this paragraph as is. BTW, I still think that tableoid is useful if foreign tables can inherit other tables. With such feature, tableoid of foreign table is necessary to determine actual source table. Once we want to support that feature, IMO we should revive tableoid system column for foreign tables. I'm not familiar with foreign table inheritance, or how it would work. If that's something that will likely be introduced in future, then surely we'd want to keep the tableoid column rather than removing it then re-introducing it later? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Documentation: remove confusing paragraph about backslash escaping
On 02/28/2012 12:10 AM, Tom Lane wrote: I suggest replacing the first and third cases with something along the lines of Note: if you have standard_conforming_strings turned off, any backslashes you write in literal string constants will need to be doubled. See Section 4.1.2.1 for more information. The second case is just a parenthetical comment and perhaps could be removed. Definitely OK by me. Thanks for looking into this! -- Hannes Sowa hs...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finer Extension dependencies
On Fri, Feb 24, 2012 at 2:09 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Hitoshi Harada umi.tan...@gmail.com writes: I confirmed DROP EXTENSION is fixed now. In turn, it seems to me requires doesn't work. My test ext2.control looks like: I'm very sorry about that. It's all about playing with pg_depend and I've failed to spend enough time on that very topic to send a patch that just works, it seems. I'm going to fix that over the week-end. Thanks for your reviewing so far. Quickly reviewed the patch and found some issues. - There are some mixture of pg_extension_feature and pg_extension_features - The doc says pg_extension_features has four columns but it's not true. - Line 608 is bad. In the loop, provides_itself is repeatedly changed to true and false and I guess that's not what you meant. - Line 854+, you can fold two blocks into one. The two blocks are similar and by giving provides list with list_make1 when control-provides == NIL you can do it in one block. - s/trak/track/ - Line 960, you missed updating classId for dependency. That's pretty much from me. I just looked at the patch and have no idea about grand architecture. Marking Waiting on Author. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 27 February 2012 19:37, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Thom Brown t...@linux.com writes: CREATE COMMAND TRIGGER test_cmd_trg BEFORE CREATE SCHEMA, CREATE OPERATOR, CREATE COLLATION, CREATE CAST EXECUTE PROCEDURE my_func(); I couldn't drop it completely unless I specified all of those commands. Why? Because I couldn't find a nice enough way to implement that given the shared infrastructure Robert and Kaigai did put into place to handle dropping of objects. It could be that I didn't look hard enough, I'll be happy to get back that feature. Well the problem is that you can add commands to a trigger en masse, but you can only remove them one at a time. Couldn't we at least allow the removal of multiple commands at the same time? The docs you wrote suggest you can do this, but you can't. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
(2012/02/28 18:08), Thom Brown wrote: If that's something that will likely be introduced in future, then surely we'd want to keep the tableoid column rather than removing it then re-introducing it later? As background knowledge, currently (9.1 and 9.2dev) foreign tables have all system columns, but they return meaningless values except tableoid. For instance, a foreign table pgbench_branches with 3 rows will return results like below (bid in the rightmost is user column): postgres=# select ctid, xmin, cmin, xmax, cmax, tableoid, postgres-# bid from pgbench_branches; ctid | xmin | cmin | xmax | cmax | tableoid | bid +--+--+--+--+--+- (4294967295,0) |0 |0 |0 |0 |16400 | 2 (4294967295,0) |0 |0 |0 |0 |16400 | 3 (4294967295,0) |0 |0 |0 |0 |16400 | 1 (3 rows) In this example, 16400 is correct oid of pg_class record for relation pgbench_branches. I don't have any idea to use system columns other than tableoid of foreign tables, because it seems difficult to define common meaning for various FDWs. One possible idea about ctid column is using it for virtual tuple id (location information of remote data) for update support, if FDW can pack location information into ItemPointerData area. We have three options: a) remove all system columns (posted patch) b) remove system columns other than tableoid c) leave all system columns as is (current 9.2dev) Incidentally, views, which is very similar object type to foreign tables, have no system columns. Thoughts? -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 28 February 2012 11:43, Thom Brown t...@linux.com wrote: On 27 February 2012 19:37, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Thom Brown t...@linux.com writes: CREATE COMMAND TRIGGER test_cmd_trg BEFORE CREATE SCHEMA, CREATE OPERATOR, CREATE COLLATION, CREATE CAST EXECUTE PROCEDURE my_func(); I couldn't drop it completely unless I specified all of those commands. Why? Because I couldn't find a nice enough way to implement that given the shared infrastructure Robert and Kaigai did put into place to handle dropping of objects. It could be that I didn't look hard enough, I'll be happy to get back that feature. Well the problem is that you can add commands to a trigger en masse, but you can only remove them one at a time. Couldn't we at least allow the removal of multiple commands at the same time? The docs you wrote suggest you can do this, but you can't. Also note that as of commit 66f0cf7da8eeaeca4b9894bfafd61789b514af4a (Remove useless const qualifier) the patch no longer applies due to changes to src/backend/commands/typecmds.c. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
On Tue, Feb 28, 2012 at 7:00 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: We have three options: a) remove all system columns (posted patch) b) remove system columns other than tableoid c) leave all system columns as is (current 9.2dev) Incidentally, views, which is very similar object type to foreign tables, have no system columns. Thoughts? I vote against (c). I'm not sure which of (a) or (b) is better. We've talked about allowing foreign tables to inherit from regular tables and visca versa, and certainly, in that situation, tableoid would be useful. But I don't think we've made a definitive decision about that. I stripped that functionality out of the original patch because it introduced a bunch of warts that we didn't have time to figure out how to fix, and it's not clear to me that anyone's spent any time thinking about that since then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
2012年2月28日12:00 Shigeru Hanada shigeru.han...@gmail.com: (2012/02/28 18:08), Thom Brown wrote: If that's something that will likely be introduced in future, then surely we'd want to keep the tableoid column rather than removing it then re-introducing it later? As background knowledge, currently (9.1 and 9.2dev) foreign tables have all system columns, but they return meaningless values except tableoid. For instance, a foreign table pgbench_branches with 3 rows will return results like below (bid in the rightmost is user column): postgres=# select ctid, xmin, cmin, xmax, cmax, tableoid, postgres-# bid from pgbench_branches; ctid | xmin | cmin | xmax | cmax | tableoid | bid +--+--+--+--+--+- (4294967295,0) |0 |0 |0 |0 |16400 | 2 (4294967295,0) |0 |0 |0 |0 |16400 | 3 (4294967295,0) |0 |0 |0 |0 |16400 | 1 (3 rows) In this example, 16400 is correct oid of pg_class record for relation pgbench_branches. I don't have any idea to use system columns other than tableoid of foreign tables, because it seems difficult to define common meaning for various FDWs. One possible idea about ctid column is using it for virtual tuple id (location information of remote data) for update support, if FDW can pack location information into ItemPointerData area. We have three options: a) remove all system columns (posted patch) b) remove system columns other than tableoid c) leave all system columns as is (current 9.2dev) Incidentally, views, which is very similar object type to foreign tables, have no system columns. Thoughts? Which is the expected behavior in case of a foreign table is constructed as a child table of a particular regular table? In this case, children foreign tables don't have columns that exist on the parent table? (Although it is no matter when a regular table is a child of a foreign table...) Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Thom Brown t...@linux.com writes: Well the problem is that you can add commands to a trigger en masse, but you can only remove them one at a time. Couldn't we at least allow the removal of multiple commands at the same time? The docs you wrote suggest you can do this, but you can't. This seems over-complicated. Triggers on tables do not have alterable properties, why should command triggers? I vote for CREATE COMMAND TRIGGER name ... properties ...; DROP COMMAND TRIGGER name; full stop. If you want to run the same trigger function on some more commands, add another trigger name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Tom Lane t...@sss.pgh.pa.us wrote: This seems over-complicated. Triggers on tables do not have alterable properties, why should command triggers? I vote for CREATE COMMAND TRIGGER name ... properties ...; DROP COMMAND TRIGGER name; full stop. If you want to run the same trigger function on some more commands, add another trigger name. +1 -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
Robert Haas robertmh...@gmail.com writes: On Tue, Feb 28, 2012 at 7:00 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: We have three options: a) remove all system columns (posted patch) b) remove system columns other than tableoid c) leave all system columns as is (current 9.2dev) Incidentally, views, which is very similar object type to foreign tables, have no system columns. Thoughts? I vote against (c). I'm not sure which of (a) or (b) is better. We've talked about allowing foreign tables to inherit from regular tables and visca versa, and certainly, in that situation, tableoid would be useful. I think it is a mistake to imagine that tableoid is only useful in inheritance contexts. As one counterexample, pg_dump selects tableoid from quite a lot of system catalogs, just as a convenient and uniform way of remembering each object's type (of course, the fact that it needs to match them up against pg_depend entries has something to do with that). More generally, if we exclude tableoid from foreign tables, that just introduces an arbitrary behavioral difference between foreign and regular tables, thus complicating any code that has use for the feature. So I believe that (a) is a pretty bad choice. I would hold still for (b) but I am not convinced that the case has been made for that either. I think it would be wise to avoid introducing behavioral churn until after we have designed and implemented update capabilities for foreign tables. If we end up putting back ctid to support that, we'll look pretty silly. In short, (c) looks like the most reasonable choice for now, with the expectation of revisiting the question after we have foreign update working. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 28 February 2012 15:03, Tom Lane t...@sss.pgh.pa.us wrote: Thom Brown t...@linux.com writes: Well the problem is that you can add commands to a trigger en masse, but you can only remove them one at a time. Couldn't we at least allow the removal of multiple commands at the same time? The docs you wrote suggest you can do this, but you can't. This seems over-complicated. Triggers on tables do not have alterable properties, why should command triggers? I vote for CREATE COMMAND TRIGGER name ... properties ...; DROP COMMAND TRIGGER name; full stop. If you want to run the same trigger function on some more commands, add another trigger name. This would make more sense, particularly since dropping a command trigger, as it stands, doesn't necessarily drop the trigger. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut pete...@gmx.net wrote: Could we name this postgresql_fdw instead? We already have several ${productname}_fdw out there, and I don't want to get in the business of having to guess variant spellings. If you don't like variant spellings, having anything to do with PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing lists, is probably a bad idea. Go Postgre! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Robert Haas robertmh...@gmail.com writes: On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut pete...@gmx.net wrote: Could we name this postgresql_fdw instead? We already have several ${productname}_fdw out there, and I don't want to get in the business of having to guess variant spellings. If you don't like variant spellings, having anything to do with PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing lists, is probably a bad idea. [ snicker ] But still, Peter has a point: pgsql is not a name for the product, it's at best an abbreviation. We aren't calling the other thing orcl_fdw or ora_fdw. I think either postgres_fdw or postgresql_fdw would be fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Tue, Feb 28, 2012 at 11:02 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Feb 24, 2012 at 5:31 PM, Peter Eisentraut pete...@gmx.net wrote: Could we name this postgresql_fdw instead? We already have several ${productname}_fdw out there, and I don't want to get in the business of having to guess variant spellings. If you don't like variant spellings, having anything to do with PostgreSQL, aka Postgres, and usually discussed on the pgsql-* mailing lists, is probably a bad idea. [ snicker ] But still, Peter has a point: pgsql is not a name for the product, it's at best an abbreviation. We aren't calling the other thing orcl_fdw or ora_fdw. I think either postgres_fdw or postgresql_fdw would be fine. I liked the shorter name, myself, but I'm not going to make a big deal about it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Wed, Feb 22, 2012 at 03:37:29PM -0500, Bruce Momjian wrote: On Wed, Feb 22, 2012 at 05:22:29PM -0300, Alvaro Herrera wrote: Not sure about this. If the upgrades completes successfully and the file is not renamed at the last minute due to some error, that would be a problem as well, because now the old cluster would happily run and perhaps corrupt the data files from under the new cluster. Well, the basic problem is that the user, before pg_upgrade started, installed a new cluster that works. If we rename the old control, but rename it back on failure, there are cases we will miss, kill like -9 or a server crash, and it will not be obvious to them that the control file was renamed. Of course, if we only rename on success, and there is kill -9 or server crash, the old cluster is still start-able, like the new one. One good argument for the rename early is that on a server crash, the system is probably going to restart the database automatically, and that means the old server. Right now we have a clear message that they need to rename the control file to start the old server. Not sure what the new wording would look like --- let me try. I have thought about this, and feel that it would be odd to lock the old cluster at the start of the upgrade, and then unlock it on a failure, particularly because we can't always unlock it, e.g. operating system crash. A cleaner solution would be to lock it when we complete the upgrade, which I have done in the attached patch. I have also added a warning about restarting the old server when link mode is used, and updated the documentation to match the new behavior. Patch attached. I would like to apply this to 9.2/HEAD. --- Performing Consistency Checks - Checking current, bin, and data directories ok Checking cluster versions ok Checking database user is a superuser ok Checking for prepared transactions ok Checking for reg* system OID user data typesok Checking for contrib/isn with bigint-passing mismatch ok Creating catalog dump ok Checking for prepared transactions ok Checking for presence of required libraries ok If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing. Performing Upgrade -- Analyzing all rows in the new cluster ok Freezing all rows on the new clusterok Deleting new commit clogs ok Copying old commit clogs to new server ok Setting next transaction ID for new cluster ok Resetting WAL archives ok Setting frozenxid counters in new cluster ok Creating databases in the new cluster ok Adding support functions to new cluster ok Restoring database schema to new clusterok Removing support functions from new cluster ok Linking user relation files ok Setting next OID for new clusterok Creating script to delete old cluster ok Adding .old suffix to old global/pg_control ok If you want to start the old cluster, you will need to remove the .old suffix from /u/pgsql.old/data/global/pg_control.old. Because link mode was used, the old cluster cannot be safely started once the new cluster has been started. Upgrade complete Optimizer statistics are not transferred by pg_upgrade so consider running: vacuumdb --all --analyze-only on the newly-upgraded cluster. Running this script will delete the old cluster's data files: /usr/local/pgdev/pg_upgrade/delete_old_cluster.sh -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index 891eb9a..a5f63eb *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** report_clusters_compatible(void) *** 148,156 } pg_log(PG_REPORT, \n ! If pg_upgrade fails after this point, you must re-initdb the new cluster\n ! before continuing. You will also need to remove the \.old\ suffix from\n ! %s/global/pg_control.old.\n, old_cluster.pgdata); } --- 148,155 } pg_log(PG_REPORT, \n ! If pg_upgrade fails after this point, you must re-initdb the\n ! new cluster before continuing.\n); } *** output_completion_banner(char *deletion_ *** 198,205 /* Did
Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label
2012/2/24 Yeb Havinga yebhavi...@gmail.com: On 2012-02-24 15:17, Yeb Havinga wrote: I don't know what's fishy about the mgrid user and root that causes c0.c1023 to be absent. more info: In shells started in a x environment under Xvnc, id -Z shows the system_u and c0.c1023 absent. In shells started from the ssh daemon, the id -Z matches what it should be according to the seusers file: unconfined_u and c0.c1023 present. It seems to me the reason why your security label on bash is different from the expected default one. Unlike sshd daemon, vncserver does not assign security label on itself according to the /etc/selinux/targeted/seusers, thus it inherits the label of system startup script. It is also the reason why you saw system_u at the head of security context. I'll report this topic to selinux community to discuss the preferable solution. Anyway, please use ssh connection for the testing purpose. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial 9.2 pgbench write results
On Thu, Feb 23, 2012 at 3:17 AM, Greg Smith g...@2ndquadrant.com wrote: I think an even bigger factor now is that the BGW writes can disturb write ordering/combining done at the kernel and storage levels. It's painfully obvious now how much PostgreSQL relies on that to get good performance. All sorts of things break badly if we aren't getting random writes scheduled to optimize seek times, in as many contexts as possible. It doesn't seem unreasonable that background writer writes can introduce some delay into the checkpoint writes, just by adding more random components to what is already a difficult to handle write/sync series. That's what I think what these results are showing is that background writer writes can deoptimize other forms of write. How hard would it be to dummy up a bgwriter which, every time it wakes up, it forks off a child process to actually do the write, and then the real one just waits for the child to exit? If it didn't have to correctly handle signals, SINVAL, and such, it should be just a few lines of code, but I don't know how much we can ignore signals and such even just for testing purposes. My thought here is that the kernel is getting in a snit over one process doing all the writing on the system, and is punishing that process in a way that ruins things for everyone. A second fact that's visible from the TPS graphs over the test run, and obvious if you think about it, is that BGW writes force data to physical disk earlier than they otherwise might go there. On a busy system like you are testing, the BGW should only be writing out data a fraction of a second before the backends would otherwise be doing it, unless the 2 minutes to circle the buffer pool logic is in control rather than the bgwriter_lru_multiplier and bgwriter_lru_maxpages logic. From the data reported, we can see how many buffer-allocations there are but not how many circles of the pool it took to find them) It doesn't seem likely that small shifts in timing are having that effect, compared to the possible effect of who is doing the writing. If the timing is truly the issue, lowering bgwriter_delay might smooth the timing out and bring closer to what the backends would do for themselves. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Standby Failover Scenario
On 02/27/2012 10:05 PM, Lucky Haryadi wrote: 3. When Master A fails to service, the database will failovered to Slave B by triggering with trigger file. As soon as you trigger a standby, it changes it to a new timeline. At that point, the series of WAL files diverges. It's no longer possible to apply them to a system that is still on the original timeline, such as your original master A in this situation. There's a good reason for that. Let's say that A committed an additional transaction before it went down, but that commit wasn't replicated to B. You can't just move records from B over anymore in that case. The only way to make sure A is in sync again is to do a new base backup, which you can potentially accelerate using rsync to only copy what has changed. I see a lot of people try to bypass one of the steps recommended in the manual using various schemes like yours, and they usually have a bug like this in there--sometimes obvious like this, sometimes subtle. Trying to get too clever here is dangerous to your database. Warning: pgsql-hackers is the mailing list for people to discuss the development of PostgreSQL, not how to use it. Questions like this should be asked on either the pgsql-admin or pgsql-general mailing list. I'm not going to answer additional questions like this from you here on this list, and I doubt anyone else will either. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial 9.2 pgbench write results
On Tue, Feb 28, 2012 at 1:15 AM, Ants Aasma ants.aa...@eesti.ee wrote: My hypothesis for the TPS regression is that it is due to write combining. When the workload is mainly bound by I/O, every little bit that can be saved helps the bottomline. Larger scalefactors don't get the benefit because there is less write combining going on overall. This is an interesting hypothesis which I think we can test. I'm thinking of writing a quick patch (just for testing, not for commit) to set a new buffer flag BM_BGWRITER_CLEANED to every buffer the background writer cleans. Then we can keep a count of how often such buffers are dirtied before they're evicted, vs. how often they're evicted before they're dirtied. If any significant percentage of them are redirtied before they're evicted, that would confirm this hypothesis. At any rate I think the numbers would be interesting to see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] swapcache-style cache?
- Цитат от Andrea Suisani (sick...@opinioni.net), на 28.02.2012 в 09:54 - On 02/28/2012 04:52 AM, Rob Wultsch wrote: On Wed, Feb 22, 2012 at 2:31 PM, jamesja...@mansionfamily.plus.com wrote: Has anyone considered managing a system like the DragonFLY swapcache for a DBMS like PostgreSQL? https://www.facebook.com/note.php?note_id=388112370932 in the same vein: http://bcache.evilpiepirate.org/ from the main page: Bcache is a patch for the Linux kernel to use SSDs to cache other block devices. It's analogous to L2Arc for ZFS, but Bcache also does writeback caching, and it's filesystem agnostic. It's designed to be switched on with a minimum of effort, and to work well without configuration on any setup. By default it won't cache sequential IO, just the random reads and writes that SSDs excel at. It's meant to be suitable for desktops, servers, high end storage arrays, and perhaps even embedded. it was submitted to linux kernel mailing list a bunch of time, the last one: https://lkml.org/lkml/2011/9/10/13 Andrea I am successfully using facebook's flashchache in write-through mode - so it speeds only reads. I have seen 3 times increase on TPS for databases that do not fit in RAM. I am using Intel X-25E over RAID10 of 4 SAS disks. I have tested also writeback mode but the gain is not so huge and there is a considerable risk for loosing all your data if/when the SSD fails. Best regards -- Luben Karavelov
Re: [HACKERS] review: CHECK FUNCTION statement
I have a few comments about this patch: I didn't like the fact that the checker calling infrastructure uses SPI instead of just a FunctionCallN to call the checker function. I think this should be easily avoidable. Second, I see that functioncmds.c gets a lot into trigger internals just to be able to figure out the function starting from a trigger name. I think it'd be saner to have a new function in trigger.c that returns the required function OID. I think CheckFunction would be clearer if the code to check multiple objects is split out into a separate subroutine. After CheckFunction there is a leftover function comment without any following function. There are other spurious hunks that add or remove single lines too (once in an otherwise untouched file). -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] strange plan - PostgreSQL 9.2
Hello I try to look on one slow query with correlated subquery: create table xx(a int primary key); create table yy(a int); insert into xx select generate_series(1,100); insert into yy select (random()*100)::int from generate_series(1,10); create index on yy(a); Query A select a, (select true from yy where xx.a = yy.a limit 1) from xx limit 10 offset 0; postgres= explain select a, (select true from yy where xx.a = yy.a limit 1) from xx; QUERY PLAN -- Seq Scan on xx (cost=0.00..4392325.00 rows=100 width=4) SubPlan 1 - Limit (cost=0.00..4.38 rows=1 width=0) - Index Only Scan using yy_a_idx on yy (cost=0.00..4.38 rows=1 width=0) Index Cond: (a = xx.a) (5 rows) plan for this query is expected But when I rewrote query I got strange plan (query B): postgres= explain select a, exists(select 1 from yy where xx.a = yy.a) from xx limit 10 offset 0; QUERY PLAN -- Limit (cost=0.00..43.92 rows=10 width=4) - Seq Scan on xx (cost=0.00..4392325.00 rows=100 width=4) SubPlan 1 - Index Only Scan using yy_a_idx on yy (cost=0.00..4.38 rows=1 width=0) Index Cond: (a = xx.a) SubPlan 2 - Seq Scan on yy (cost=0.00..1443.00 rows=10 width=4) (7 rows) Why there are a SubPlan 2? But query B is two time faster than query A public | xx| table | pavel| 35 MB | public | yy| table | pavel| 3576 kB| regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial 9.2 pgbench write results
On Tue, Feb 28, 2012 at 11:36 AM, Jeff Janes jeff.ja...@gmail.com wrote: How hard would it be to dummy up a bgwriter which, every time it wakes up, it forks off a child process to actually do the write, and then the real one just waits for the child to exit? If it didn't have to correctly handle signals, SINVAL, and such, it should be just a few lines of code, but I don't know how much we can ignore signals and such even just for testing purposes. My thought here is that the kernel is getting in a snit over one process doing all the writing on the system, and is punishing that process in a way that ruins things for everyone. I would assume the only punishment that the kernel would inflict would be to put the bgwriter to sleep. That would make the bgwriter less effective, of course, but it shouldn't make it worse than no bgwriter at all. Unless it does it some really stupid way, like making bgwriter sleep while it holds some lock. But maybe I'm missing something - what do you have in mind? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial 9.2 pgbench write results
- Цитат от Robert Haas (robertmh...@gmail.com), на 28.02.2012 в 19:25 - On Tue, Feb 28, 2012 at 11:36 AM, Jeff Janes jeff.ja...@gmail.com wrote: How hard would it be to dummy up a bgwriter which, every time it wakes up, it forks off a child process to actually do the write, and then the real one just waits for the child to exit? If it didn't have to correctly handle signals, SINVAL, and such, it should be just a few lines of code, but I don't know how much we can ignore signals and such even just for testing purposes. My thought here is that the kernel is getting in a snit over one process doing all the writing on the system, and is punishing that process in a way that ruins things for everyone. I would assume the only punishment that the kernel would inflict would be to put the bgwriter to sleep. That would make the bgwriter less effective, of course, but it shouldn't make it worse than no bgwriter at all. Unless it does it some really stupid way, like making bgwriter sleep while it holds some lock. But maybe I'm missing something - what do you have in mind? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Luben Karavelov
Re: [HACKERS] Initial 9.2 pgbench write results
On Tue, Feb 28, 2012 at 11:46 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Feb 28, 2012 at 1:15 AM, Ants Aasma ants.aa...@eesti.ee wrote: My hypothesis for the TPS regression is that it is due to write combining. When the workload is mainly bound by I/O, every little bit that can be saved helps the bottomline. Larger scalefactors don't get the benefit because there is less write combining going on overall. This is an interesting hypothesis which I think we can test. I'm thinking of writing a quick patch (just for testing, not for commit) to set a new buffer flag BM_BGWRITER_CLEANED to every buffer the background writer cleans. Then we can keep a count of how often such buffers are dirtied before they're evicted, vs. how often they're evicted before they're dirtied. If any significant percentage of them are redirtied before they're evicted, that would confirm this hypothesis. At any rate I think the numbers would be interesting to see. Patch attached. I tried it on my laptop with a 60-second pgbench run at scale factor 100, and got this: LOG: bgwriter_clean: 1387 evict-before-dirty, 10 dirty-before-evict LOG: bgwriter_clean: 1372 evict-before-dirty, 10 dirty-before-evict LOG: bgwriter_clean: 1355 evict-before-dirty, 10 dirty-before-evict LOG: bgwriter_clean: 1344 evict-before-dirty, 8 dirty-before-evict LOG: bgwriter_clean: 1418 evict-before-dirty, 8 dirty-before-evict LOG: bgwriter_clean: 1345 evict-before-dirty, 7 dirty-before-evict LOG: bgwriter_clean: 1339 evict-before-dirty, 6 dirty-before-evict LOG: bgwriter_clean: 1362 evict-before-dirty, 9 dirty-before-evict That doesn't look bad at all. Then I reset the stats, tried it again, and got this: LOG: bgwriter_clean: 3863 evict-before-dirty, 198 dirty-before-evict LOG: bgwriter_clean: 3861 evict-before-dirty, 199 dirty-before-evict LOG: bgwriter_clean: 3978 evict-before-dirty, 218 dirty-before-evict LOG: bgwriter_clean: 3928 evict-before-dirty, 204 dirty-before-evict LOG: bgwriter_clean: 3956 evict-before-dirty, 207 dirty-before-evict LOG: bgwriter_clean: 3906 evict-before-dirty, 222 dirty-before-evict LOG: bgwriter_clean: 3912 evict-before-dirty, 197 dirty-before-evict LOG: bgwriter_clean: 3853 evict-before-dirty, 200 dirty-before-evict OK, that's not so good, but I don't know why it's different. I'm not sure I can reproduce the exact same scenario Greg is seeing - this is totally different hardware - but I'll play around with it a little bit more. Greg, if you happen to feel like testing this on one of your problem cases I'd be interested in seeing what it spits out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company bgwriter_clean_stats.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Initial 9.2 pgbench write results
Jeff Janes jeff.ja...@gmail.com writes: How hard would it be to dummy up a bgwriter which, every time it wakes up, it forks off a child process to actually do the write, and then the real one just waits for the child to exit? If it didn't have to correctly handle signals, SINVAL, and such, it should be just a few lines of code, but I don't know how much we can ignore signals and such even just for testing purposes. My thought here is that the kernel is getting in a snit over one process doing all the writing on the system, and is punishing that process in a way that ruins things for everyone. If that is the case (which I don't know one way or the other), I'm not sure that having subprocesses do the work would be an improvement. We know for a fact that the OOM killer knows enough to blame memory consumed by a child process on the parent. Resource limitations of other sorts might work similarly. But a bigger problem is that fork() is very very far from being zero cost. I think doing one per write() would swamp any improvement you could hope to get. It does make sense that the bgwriter would get hit by niceness penalties after it'd run up sufficient runtime. If we could get some hard numbers about how long it takes for that to happen, we could consider letting the bgwriter exit (and the postmaster spawn a new one) every so often. This is just gaming the scheduler, and so one would like to think there's a better way to do it, but I don't know of any non-root-privilege way to stave off getting niced. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Mon, Feb 27, 2012 at 4:03 AM, Simon Riggs si...@2ndquadrant.com wrote: So please use a scale factor that the hardware can cope with. OK. I tested this out on Nate Boley's 32-core AMD machine, using scale factor 100 and scale factor 300. I initialized it with Simon's patch, which should have the effect of rendering the entire table unhinted and giving each row a different XID. I used my usual configuration settings for that machine, which are: shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit = off, checkpoint_segments = 300, checkpoint_timeout = 15min, checkpoint_completion_target = 0.9, wal_writer_delay = 20ms. I did three runs on master, as of commit 9bf8603c7a9153cada7e32eb0cf7ac1feb1d3b56, and three runs with the clog_history_v4 patch applied. The command to initialize the database was: ~/install/clog-contention/bin/pgbench -i -I -s $scale The command to run the test was: ~/install/clog-contention/bin/pgbench -l -T 1800 -c 32 -j 32 -n Executive Summary: The patch makes things way slower at scale factor 300, and possibly slightly slower at scale factor 100. Detailed Results: resultslp.clog_history_v4.32.100.1800:tps = 14286.049637 (including connections establishing) resultslp.clog_history_v4.32.100.1800:tps = 13532.814984 (including connections establishing) resultslp.clog_history_v4.32.100.1800:tps = 13972.987301 (including connections establishing) resultslp.clog_history_v4.32.300.1800:tps = 5061.650470 (including connections establishing) resultslp.clog_history_v4.32.300.1800:tps = 4871.126457 (including connections establishing) resultslp.clog_history_v4.32.300.1800:tps = 5861.124177 (including connections establishing) resultslp.master.32.100.1800:tps = 13420.777222 (including connections establishing) resultslp.master.32.100.1800:tps = 14912.336257 (including connections establishing) resultslp.master.32.100.1800:tps = 14505.718977 (including connections establishing) resultslp.master.32.300.1800:tps = 14766.984548 (including connections establishing) resultslp.master.32.300.1800:tps = 14783.026190 (including connections establishing) resultslp.master.32.300.1800:tps = 14567.504887 (including connections establishing) I don't know whether this is just a bug or whether there's some more fundamental problem with the approach. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Tue, Feb 28, 2012 at 11:21 AM, Bruce Momjian br...@momjian.us wrote: On Wed, Feb 22, 2012 at 03:37:29PM -0500, Bruce Momjian wrote: On Wed, Feb 22, 2012 at 05:22:29PM -0300, Alvaro Herrera wrote: Not sure about this. If the upgrades completes successfully and the file is not renamed at the last minute due to some error, that would be a problem as well, because now the old cluster would happily run and perhaps corrupt the data files from under the new cluster. Well, the basic problem is that the user, before pg_upgrade started, installed a new cluster that works. If we rename the old control, but rename it back on failure, there are cases we will miss, kill like -9 or a server crash, and it will not be obvious to them that the control file was renamed. Of course, if we only rename on success, and there is kill -9 or server crash, the old cluster is still start-able, like the new one. One good argument for the rename early is that on a server crash, the system is probably going to restart the database automatically, and that means the old server. Right now we have a clear message that they need to rename the control file to start the old server. Not sure what the new wording would look like --- let me try. I have thought about this, and feel that it would be odd to lock the old cluster at the start of the upgrade, and then unlock it on a failure, particularly because we can't always unlock it, e.g. operating system crash. A cleaner solution would be to lock it when we complete the upgrade, which I have done in the attached patch. I have also added a warning about restarting the old server when link mode is used, and updated the documentation to match the new behavior. Patch attached. I would like to apply this to 9.2/HEAD. --- Performing Consistency Checks - Checking current, bin, and data directories ok Checking cluster versions ok Checking database user is a superuser ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok Creating catalog dump ok Checking for prepared transactions ok Checking for presence of required libraries ok If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing. Performing Upgrade -- Analyzing all rows in the new cluster ok Freezing all rows on the new cluster ok Deleting new commit clogs ok Copying old commit clogs to new server ok Setting next transaction ID for new cluster ok Resetting WAL archives ok Setting frozenxid counters in new cluster ok Creating databases in the new cluster ok Adding support functions to new cluster ok Restoring database schema to new cluster ok Removing support functions from new cluster ok Linking user relation files ok Setting next OID for new cluster ok Creating script to delete old cluster ok Adding .old suffix to old global/pg_control ok If you want to start the old cluster, you will need to remove the .old suffix from /u/pgsql.old/data/global/pg_control.old. Because link mode was used, the old cluster cannot be safely started once the new cluster has been started. Upgrade complete Optimizer statistics are not transferred by pg_upgrade so consider running: vacuumdb --all --analyze-only on the newly-upgraded cluster. Running this script will delete the old cluster's data files: /usr/local/pgdev/pg_upgrade/delete_old_cluster.sh I think you should rename the old control file just before the step that says linking user relation files. That's the point after which it becomes unsafe to start the old cluster, right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: URI connection string support for libpq
On 02/24/2012 03:18 PM, Florian Weimer wrote: * Alex Shulgin: It's ugly, but it's standard practice, and seems better than a separate -d parameter (which sort of defeats the purpose of URIs). Hm, do you see anything what's wrong with ?dbname=other if you don't like a separate -d? It's not nice URI syntax, but it's better than an out-of-band mechanism. Attached is v5 of the patch, adding support for local Unix socket directory specification w/o the need to percent-encode path separators. The path to directory must start with forward slash, like so: postgres:///path/to/socket/dir To specify non-default dbname use URI query parameters: postgres:///path/to/socket/dir?dbname=other Username/password should be also specified on query parameters in this case, as opposed to user:pw@host syntax supported by host URIs. -- Alex *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** *** 282,287 static const PQEnvironmentOption EnvironmentOptions[] = --- 282,290 } }; + /* The connection URI must start with either of the following designators: */ + static const char uri_designator[] = postgresql://; + static const char short_uri_designator[] = postgres://; static bool connectOptions1(PGconn *conn, const char *conninfo); static bool connectOptions2(PGconn *conn); *** *** 297,303 static PQconninfoOption *conninfo_parse(const char *conninfo, static PQconninfoOption *conninfo_array_parse(const char *const * keywords, const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); ! static char *conninfo_getval(PQconninfoOption *connOptions, const char *keyword); static void defaultNoticeReceiver(void *arg, const PGresult *res); static void defaultNoticeProcessor(void *arg, const char *message); --- 300,333 static PQconninfoOption *conninfo_array_parse(const char *const * keywords, const char *const * values, PQExpBuffer errorMessage, bool use_defaults, int expand_dbname); ! static PQconninfoOption *conninfo_uri_parse(const char *uri, ! PQExpBuffer errorMessage); ! static bool conninfo_uri_parse_options(PQconninfoOption *options, ! const char *uri, char *buf, ! PQExpBuffer errorMessage); ! static char *conninfo_uri_parse_local_socket_dir(PQconninfoOption *options, ! const char *uri, ! char *buf, char *lastc, ! PQExpBuffer errorMessage); ! static char *conninfo_uri_parse_remote_host(PQconninfoOption *options, ! const char *uri, ! char *buf, char *lastc, ! PQExpBuffer errorMessage); ! static bool conninfo_uri_parse_params(char *params, ! PQconninfoOption *connOptions, ! PQExpBuffer errorMessage); ! static char *conninfo_uri_decode(const char *str, PQExpBuffer errorMessage); ! static bool get_hexdigit(char digit, int *value); ! static const char *conninfo_getval(PQconninfoOption *connOptions, ! const char *keyword); ! static PQconninfoOption *conninfo_storeval(PQconninfoOption *connOptions, ! const char *keyword, const char *value, ! PQExpBuffer errorMessage, bool ignoreMissing); ! static PQconninfoOption *conninfo_store_uri_encoded_value( ! PQconninfoOption *connOptions, ! const char *keyword, const char *encoded_value, ! PQExpBuffer errorMessage, bool ignoreMissing); ! static PQconninfoOption *conninfo_find(PQconninfoOption *connOptions, const char *keyword); static void defaultNoticeReceiver(void *arg, const PGresult *res); static void defaultNoticeProcessor(void *arg, const char *message); *** *** 580,586 PQconnectStart(const char *conninfo) static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions) { ! char *tmp; /* * Move option values into conn structure --- 610,616 static void fillPGconn(PGconn *conn, PQconninfoOption *connOptions) { ! const char *tmp; /* * Move option values into conn structure *** *** 3739,3745 ldapServiceLookup(const char *purl, PQconninfoOption *options, static int parseServiceInfo(PQconninfoOption *options, PQExpBuffer errorMessage) { ! char *service = conninfo_getval(options, service); char serviceFile[MAXPGPATH]; char *env; bool group_found = false; --- 3769,3775 static int parseServiceInfo(PQconninfoOption *options, PQExpBuffer errorMessage) { ! const char *service = conninfo_getval(options, service); char serviceFile[MAXPGPATH]; char *env; bool group_found = false; *** *** 4129,4161 conninfo_parse(const char *conninfo, PQExpBuffer errorMessage, } /* ! * Now we have the name and the value. Search for the param record. ! */ ! for (option = options; option-keyword != NULL; option++) ! { ! if (strcmp(option-keyword, pname) == 0) ! break; ! } ! if (option-keyword == NULL) ! { ! printfPQExpBuffer(errorMessage, ! libpq_gettext(invalid connection
Re: [HACKERS] pg_upgrade --logfile option documentation
Excerpts from Robert Haas's message of mar feb 28 15:24:45 -0300 2012: I think you should rename the old control file just before the step that says linking user relation files. That's the point after which it becomes unsafe to start the old cluster, right? Also, if it's not using link mode, what is the point in doing the rename in the first place? I was using copy mode when I did my tests and yet it got renamed (unless link mode is now the default, which I doubt?) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Runtime SHAREDIR for testing CREATE EXTENSION
On Sun, Feb 26, 2012 at 7:36 AM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2012-02-25 at 14:21 +0100, Christoph Berg wrote: Well, I'm trying to invoke the extension's make check target at extension build time. I do have a temporary installation I own somehwere in my $HOME, but that is still trying to find extensions in /usr/share/postgresql/9.1/extension/*.control, because I am using the system's postgresql version. The build process is not running as root, so I cannot do an install of the extension to its final location. Still it would be nice to run regression tests. All that seems to be missing is the ability to put extension_control_path = /home/buildd/tmp/extension into the postgresql.conf of the temporary PG installation, or some other way like CREATE EXTENSION foobar WITH CONTROL '/home/buildd/...'. Yeah, of course, the extension path is not related to the data directory. So we do need some kind of path setting, just like dynamic_library_path. A thought that may or may not be related: I'd really like to support libraries (C or otherwise) of multiple versions at the same time, when the underlying library permits. The critical use case is contribs that iterate on their storage format, which right now is really painful. While perhaps contribs should use a version byte or something and be backwards-compatible forever, to some extent that is clearly impossible and sometimes undesirable for contribs that emit small on-disk representations, and would rather use, say, the type system to disambiguate what operators fit with what data. I've been struggling to think of a more graceful way to load the correct version of extra dependencies (including non-PG intrinsic things like libgeos or libproj) when new database replicas are created, as well as helping people with upgrade issues. Is this tunable possibly related to a solution for this? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On tis, 2012-02-28 at 11:20 -0500, Robert Haas wrote: [ snicker ] But still, Peter has a point: pgsql is not a name for the product, it's at best an abbreviation. We aren't calling the other thing orcl_fdw or ora_fdw. I think either postgres_fdw or postgresql_fdw would be fine. I liked the shorter name, myself, but I'm not going to make a big deal about it. Let's at least be clear about the reasons here. The fact that postgresql_fdw_validator exists means (a) there is a possible naming conflict that has not been discussed yet, and/or (b) the name is already settled and we need to think of a way to make postgresql_fdw_validator work with the new actual FDW. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Tue, Feb 28, 2012 at 01:24:45PM -0500, Robert Haas wrote: Running this script will delete the old cluster's data files: /usr/local/pgdev/pg_upgrade/delete_old_cluster.sh I think you should rename the old control file just before the step that says linking user relation files. That's the point after which it becomes unsafe to start the old cluster, right? Yes, it is true that that is the danger point, and also it is much less likely to fail at that point --- it usually happens during the schema creation. I would have to add some more conditional wording without clearly stating if the old suffix is present. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade --logfile option documentation
On Tue, Feb 28, 2012 at 03:48:03PM -0300, Alvaro Herrera wrote: Excerpts from Robert Haas's message of mar feb 28 15:24:45 -0300 2012: I think you should rename the old control file just before the step that says linking user relation files. That's the point after which it becomes unsafe to start the old cluster, right? Also, if it's not using link mode, what is the point in doing the rename in the first place? I was using copy mode when I did my tests and yet it got renamed (unless link mode is now the default, which I doubt?) You are right that we lock unconditionally. We can certainly only lock in link mode. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention, part 2
On Tue, Feb 28, 2012 at 6:11 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 27, 2012 at 4:03 AM, Simon Riggs si...@2ndquadrant.com wrote: So please use a scale factor that the hardware can cope with. OK. I tested this out on Nate Boley's 32-core AMD machine, using scale factor 100 and scale factor 300. I initialized it with Simon's patch, which should have the effect of rendering the entire table unhinted and giving each row a different XID. Thanks for making the test. I think this tells me the only real way to do this kind of testing is not at arms length from a test machine. So time to get my hands on a machine, but not for this release. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Hello Dne 28. února 2012 17:48 Alvaro Herrera alvhe...@commandprompt.com napsal(a): I have a few comments about this patch: I didn't like the fact that the checker calling infrastructure uses SPI instead of just a FunctionCallN to call the checker function. I think this should be easily avoidable. It is not possible - or it has not simple solution (I don't how to do it). PLpgSQL_checker is SRF function. SPI is used for processing returned resultset. I looked to pg source code, and I didn't find any other pattern than using SPI for SRF function call. It is probably possible, but it means some code duplication too. I invite any ideas. Second, I see that functioncmds.c gets a lot into trigger internals just to be able to figure out the function starting from a trigger name. I think it'd be saner to have a new function in trigger.c that returns the required function OID. done I think CheckFunction would be clearer if the code to check multiple objects is split out into a separate subroutine. done After CheckFunction there is a leftover function comment without any following function. There are other spurious hunks that add or remove single lines too (once in an otherwise untouched file). fixed I refreshed patch for current git repository. Regards Pavel -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support check_function-2012-02-28-2.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Feb 28, 2012, at 8:20 AM, Robert Haas wrote: I liked the shorter name, myself, but I'm not going to make a big deal about it. pg_ is used quite a bit. what about pg_fdw? David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Excerpts from Pavel Stehule's message of mar feb 28 16:30:58 -0300 2012: I refreshed patch for current git repository. Thanks, I'll have a look. Oh, another thing -- you shouldn't patch the 1.0 version of the plpgsql extension. Rather I think you should produce a 1.1 version. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
2012/2/28 Alvaro Herrera alvhe...@commandprompt.com: Excerpts from Pavel Stehule's message of mar feb 28 16:30:58 -0300 2012: I refreshed patch for current git repository. Thanks, I'll have a look. Oh, another thing -- you shouldn't patch the 1.0 version of the plpgsql extension. Rather I think you should produce a 1.1 version. there is patch with updated tag. I am not sure if there are some changes in pg_upgrade are necessary?? Regards and thank you Pavel -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support check_function-2012-02-28-3.diff.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
In gram.y we have a new check_option_list nonterminal. This is mostly identical to explain_option_list, except that the option args do not take a NumericOnly (only opt_boolean_or_string and empty). I wonder if it's really worthwhile having a bunch of separate productions for this; how about we just use the existing explain_option_list instead and get rid of those extra productions? elog() is used in many user-facing messages (errors and notices). Full ereport() calls should be used there, so that messages are marked for translations and so on. Does the patched pg_dump work with older servers? I don't like CheckFunction being declared in defrem.h. It seems completely out of place there. I don't see any better place though, so I'm thinking maybe we should have a new header file for it (say commands/functions.h; but we already have executor/functions.h so perhaps it's better to find another name). This addition means that there's a distressingly large number of .c files that are now getting dest.h, which was previously pretty confined. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange plan - PostgreSQL 9.2
Pavel Stehule pavel.steh...@gmail.com writes: Why there are a SubPlan 2? http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] strange plan - PostgreSQL 9.2
2012/2/28 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: Why there are a SubPlan 2? http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=bd3daddaf232d95b0c9ba6f99b0170a0147dd8af regards, tom lane Thank you - I can verify so it works well, but a EXPLAIN result is really strange Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
2012/2/28 Alvaro Herrera alvhe...@commandprompt.com: In gram.y we have a new check_option_list nonterminal. This is mostly identical to explain_option_list, except that the option args do not take a NumericOnly (only opt_boolean_or_string and empty). I wonder if it's really worthwhile having a bunch of separate productions for this; how about we just use the existing explain_option_list instead and get rid of those extra productions? I have to look on it elog() is used in many user-facing messages (errors and notices). Full ereport() calls should be used there, so that messages are marked for translations and so on. yes, I'll fix it Does the patched pg_dump work with older servers? yes, It should to do I don't like CheckFunction being declared in defrem.h. It seems completely out of place there. I don't see any better place though, so I'm thinking maybe we should have a new header file for it (say commands/functions.h; but we already have executor/functions.h so perhaps it's better to find another name). This addition means that there's a distressingly large number of .c files that are now getting dest.h, which was previously pretty confined. you have much better knowledge about headers then me, so, please, propose solution. Regards Pavel -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NULL's support in SP-GiST
On Thu, Feb 2, 2012 at 4:26 PM, Oleg Bartunov o...@sai.msu.su wrote: Hi there, attached patch introduces NULLs indexing for SP-GiST. With this patch Sp-GiST supports IS NULL, IS NOT NULL clauses, as well as full index scan. I was looking at this. It passes all regression tests, and seems to work fine. What i don't like about it is that spgnull.c actually call GIN functions and even uses GIN flags. Don't know how bad it is, but IMO there is a module violation here. -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parameterized-path cost comparisons need some work
I was experimenting today with a test case sent to me by somebody at Red Hat. The details aren't too important, except that it involves an inner join on the inside of a left join (where it cannot commute with the left join). I can reproduce the behavior with a standard regression test table, if I crank random_page_cost up a bit: regression=# set random_page_cost TO 5; SET regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred where t1.unique1 = 1; QUERY PLAN Nested Loop Left Join (cost=0.00..507.16 rows=100 width=732) - Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.00..10.28 rows=1 width=244) Index Cond: (unique1 = 1) - Nested Loop (cost=0.00..495.63 rows=100 width=488) - Index Scan using tenk1_hundred on tenk1 t2 (cost=0.00..446.98 rows=100 width=244) Index Cond: (t1.hundred = hundred) - Index Scan using tenk1_unique2 on tenk1 t3 (cost=0.00..0.48 rows=1 width=244) Index Cond: (unique2 = t2.thousand) (8 rows) regression=# set random_page_cost TO 6; SET regression=# explain select * from tenk1 t1 left join (tenk1 t2 join tenk1 t3 on t2.thousand = t3.unique2) on t1.hundred = t2.hundred where t1.unique1 = 1; QUERY PLAN - Hash Right Join (cost=928.30..2573.80 rows=100 width=732) Hash Cond: (t2.hundred = t1.hundred) - Hash Join (cost=916.00..2523.00 rows=1 width=488) Hash Cond: (t2.thousand = t3.unique2) - Seq Scan on tenk1 t2 (cost=0.00..458.00 rows=1 width=244) - Hash (cost=458.00..458.00 rows=1 width=244) - Seq Scan on tenk1 t3 (cost=0.00..458.00 rows=1 width=244) - Hash (cost=12.29..12.29 rows=1 width=244) - Index Scan using tenk1_unique1 on tenk1 t1 (cost=0.00..12.29 rows=1 width=244) Index Cond: (unique1 = 1) (10 rows) WTF? How did it suddenly fail to find the double-nested-loop plan, and have to settle for a much worse plan? The reason seems to be that for large enough random_page_cost, the seqscan on t2 is costed as cheaper than an indexscan that selects a significant fraction of the table. (Notice the t2 scans are nearly the same cost in these two examples.) That causes add_path to decide that the unparameterized seqscan is unconditionally better than the parameterized indexscan, and throw out the latter. Now it is impossible to form the parameterized nestloop subplan. The flaw in this logic, of course, is that the seqscan might be cheaper than the parameterized indexscan, but *it produces a whole lot more rows*, meaning that any subsequent join will be a lot more expensive. Previously add_path didn't have to worry about that, because all ordinary paths for a given relation produce the same number of rows (and we studiously kept things like inner indexscan paths out of add_path's view of the world). The most obvious thing to do about this is to consider that one path can dominate another on cost only if it is both cheaper *and* produces no more rows. But I'm concerned about the cost of inserting yet another test condition into add_path, which is slow enough already. Has anyone got an idea for a different formulation that would avoid that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should we add crc32 in libpgport?
Daniel Farina dan...@heroku.com writes: Thinking unnecessary. Motion is progress. Here is a patch that uses this exact plan: pgport for the tables, broken out into a header file that is included in the building of libpgport. I have confirmed by objdump -t that multiple copies of the table are not included in the postgres binary and the bloat has not occurred. Applied with minor adjustments; mainly, I cleaned up some additional traces of the old way of building pg_resetxlog and pg_controldata. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 3rd Cluster Hackers Summit, May 15th in Ottawa
All, First, I've added a wiki page: http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit Second, please RSVP! At this point I've heard from a total of 3 people. I need to know who/how many are coming so that I can book the space, and lunch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [pgsql-cluster-hackers] 3rd Cluster Hackers Summit, May 15th in Ottawa
Hi, This is Koichi. I'm joining the summit. I've heard at least two or more people will join from NTT.I'm expecting a couple of people from EDB too. Regards; -- Koichi Suzuki 2012/2/29 Josh Berkus j...@agliodbs.com: All, First, I've added a wiki page: http://wiki.postgresql.org/wiki/PgCon2012CanadaClusterSummit Second, please RSVP! At this point I've heard from a total of 3 people. I need to know who/how many are coming so that I can book the space, and lunch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-cluster-hackers mailing list (pgsql-cluster-hack...@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-cluster-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [pgsql-cluster-hackers] 3rd Cluster Hackers Summit, May 15th in Ottawa
Hi Josh, Second, please RSVP! At this point I've heard from a total of 3 people. I need to know who/how many are coming so that I can book the space, and lunch. Sorry for my late reply. I will join the cluster meeting as a member of Postgres-XC. Regards, -- Michael Paquier http://michael.otacoo.com
[HACKERS] a typo in json.h
Seems there is a trivial typo in json.h diff --git a/src/include/utils/json.h b/src/include/utils/json.h index 415787b..fa0d4fc 100644 --- a/src/include/utils/json.h +++ b/src/include/utils/json.h @@ -26,4 +26,4 @@ extern Datum row_to_json(PG_FUNCTION_ARGS); extern Datum row_to_json_pretty(PG_FUNCTION_ARGS); extern void escape_json(StringInfo buf, const char *str); -#endif /* XML_H */ +#endif /* JSON_H */
Re: [HACKERS] pg_upgrade --logfile option documentation
On Tue, Feb 28, 2012 at 02:15:30PM -0500, Bruce Momjian wrote: On Tue, Feb 28, 2012 at 01:24:45PM -0500, Robert Haas wrote: Running this script will delete the old cluster's data files: /usr/local/pgdev/pg_upgrade/delete_old_cluster.sh I think you should rename the old control file just before the step that says linking user relation files. That's the point after which it becomes unsafe to start the old cluster, right? Yes, it is true that that is the danger point, and also it is much less likely to fail at that point --- it usually happens during the schema creation. I would have to add some more conditional wording without clearly stating if the old suffix is present. OK, I have implemented both Roberts and Àlvaro's ideas in my patch. I only add the .old suffix to pg_controldata when link mode is used, and I now do it after the schema has been created (the most common failure case for pg_upgrade), and just before we actually link files --- both very good ideas. Patch attached; new pg_upgrade output with link mode below. --- Performing Consistency Checks - Checking current, bin, and data directories ok Checking cluster versions ok Checking database user is a superuser ok Checking for prepared transactions ok Checking for reg* system OID user data typesok Checking for contrib/isn with bigint-passing mismatch ok Creating catalog dump ok Checking for prepared transactions ok Checking for presence of required libraries ok If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing. Performing Upgrade -- Analyzing all rows in the new cluster ok Freezing all rows on the new clusterok Deleting new commit clogs ok Copying old commit clogs to new server ok Setting next transaction ID for new cluster ok Resetting WAL archives ok Setting frozenxid counters in new cluster ok Creating databases in the new cluster ok Adding support functions to new cluster ok Restoring database schema to new clusterok Removing support functions from new cluster ok Adding .old suffix to old global/pg_control ok If you want to start the old cluster, you will need to remove the .old suffix from /u/pgsql.old/data/global/pg_control.old. Because link mode was used, the old cluster cannot be safely started once the new cluster has been started. Linking user relation files ok Setting next OID for new clusterok Creating script to delete old cluster ok Upgrade complete Optimizer statistics are not transferred by pg_upgrade so consider running: vacuumdb --all --analyze-only on the newly-upgraded cluster. Running this script will delete the old cluster's data files: /usr/local/pgdev/pg_upgrade/delete_old_cluster.sh -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index 891eb9a..a5f63eb *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** report_clusters_compatible(void) *** 148,156 } pg_log(PG_REPORT, \n ! If pg_upgrade fails after this point, you must re-initdb the new cluster\n ! before continuing. You will also need to remove the \.old\ suffix from\n ! %s/global/pg_control.old.\n, old_cluster.pgdata); } --- 148,155 } pg_log(PG_REPORT, \n ! If pg_upgrade fails after this point, you must re-initdb the\n ! new cluster before continuing.\n); } *** output_completion_banner(char *deletion_ *** 198,205 /* Did we copy the free space files? */ if (GET_MAJOR_VERSION(old_cluster.major_version) = 804) pg_log(PG_REPORT, ! Optimizer statistics are not transferred by pg_upgrade so consider\n ! running:\n vacuumdb --all --analyze-only\n on the newly-upgraded cluster.\n\n); else --- 197,204 /* Did we copy the free space files? */ if (GET_MAJOR_VERSION(old_cluster.major_version) = 804) pg_log(PG_REPORT, ! Optimizer statistics are not transferred by pg_upgrade so\n ! consider running:\n vacuumdb --all --analyze-only\n on the newly-upgraded cluster.\n\n); else diff
Re: [HACKERS] a typo in json.h
Excerpts from Haifeng Liu's message of mar feb 28 23:31:26 -0300 2012: Seems there is a trivial typo in json.h Thanks, fixed. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW system columns
(2012/02/28 23:37), Kohei KaiGai wrote: 2012年2月28日12:00 Shigeru Hanadashigeru.han...@gmail.com: We have three options: a) remove all system columns (posted patch) b) remove system columns other than tableoid c) leave all system columns as is (current 9.2dev) Incidentally, views, which is very similar object type to foreign tables, have no system columns. Thoughts? Which is the expected behavior in case of a foreign table is constructed as a child table of a particular regular table? In this case, children foreign tables don't have columns that exist on the parent table? (Although it is no matter when a regular table is a child of a foreign table...) If we support table inheritance by foreign tables, foreign tables should return something for all system columns, because a child table MUST have all columns held by all parent tables. I'm not sure that foreign tables should have system columns physically, like the option c). For now, c) seems most reasonable to me. -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers