Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote: Noah Misch n...@leadboat.com writes: I created a function that does this in a loop: HeapTuple t; CatalogCacheFlushCatalog(ProcedureRelationId); t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); if (!HeapTupleIsValid(t)) elog(ERROR, cache lookup failed for function 42); ReleaseSysCache(t); ... but this performance test seems to me to be entirely misguided, because it's testing a situation that isn't going to occur much in the field, precisely because the syscache should prevent constant reloads of the same syscache entry. [ideas for more-realistic tests] Granted, but I don't hope to reliably measure a change in a macro-benchmark after seeing a rickety 2% change in a micro-benchmark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
Noah Misch n...@leadboat.com writes: On Mon, Dec 19, 2011 at 11:13:57PM -0500, Tom Lane wrote: ... but this performance test seems to me to be entirely misguided, because it's testing a situation that isn't going to occur much in the field, precisely because the syscache should prevent constant reloads of the same syscache entry. [ideas for more-realistic tests] Granted, but I don't hope to reliably measure a change in a macro-benchmark after seeing a rickety 2% change in a micro-benchmark. No, I'm not sure about that at all. In particular I think that CatalogCacheFlushCatalog is pretty expensive and so the snapshot costs could be a larger part of a more-realistic test. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Tue, Dec 20, 2011 at 8:36 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I was thinking that we would warn when such was found, set hint bits as needed, and rewrite with the new CRC. In the unlikely event that it was a torn hint-bit-only page update, it would be a warning about something which is a benign side-effect of the OS or hardware crash. But that's terrible. Surely you don't want to tell people: WARNING: Your database is corrupted, or maybe not. But don't worry, I modified the data block so that you won't get this warning again. OK, I guess I'm not sure that you don't want to tell people that. But *I* don't! This seems to be a frequent problem with this whole doing CRCs on pages thing. It's not evident which problems will be real ones. And in such cases, is the answer to turf the database and recover from backup, because of a single busted page? For a big database, I'm not sure that's less scary than the possibility of one page having a corruption. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pause at end of recovery
On Tue, Dec 20, 2011 at 1:40 PM, Magnus Hagander mag...@hagander.net wrote: These days we have pause_at_recovery_target, which lets us pause when we reach a PITR target. Is there a particular reason we don't have a way to pause at end of recovery if we *didn't* specify a target - meaning we let it run until the end of the archived log? While it's too late to change the target, I can see a lot of usescases where you don't want it to be possible to make changes to the database again until it has been properly verified - and keeping it up in readonly mode in that case can be quite useful... Useful for what purpose? It' s possible to deny access in other ways already. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Excerpts from Christopher Browne's message of mar dic 20 14:12:56 -0300 2011: It's not evident which problems will be real ones. And in such cases, is the answer to turf the database and recover from backup, because of a single busted page? For a big database, I'm not sure that's less scary than the possibility of one page having a corruption. I don't think the problem is having one page of corruption. The problem is *not knowing* that random pages are corrupted, and living in the fear that they might be. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication timeout units
Peter Geoghegan pe...@2ndquadrant.com wrote: Magnus Hagander mag...@hagander.net wrote: from postgresql.conf.sample: #replication_timeout = 60s # in milliseconds; 0 disables Seconds or milliseconds? I would suggest we just remove the in milliseconds, and instead say timeout for replication connections; 0 disables. +1 from me. That's very confusing. Isn't it providing information on both the granularity and the default unit if none is specified? Why is it more confusing here than statement_timeout or any of the other places this pattern is followed? -1 from me on removing it *only* here. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I was thinking that we would warn when such was found, set hint bits as needed, and rewrite with the new CRC. In the unlikely event that it was a torn hint-bit-only page update, it would be a warning about something which is a benign side-effect of the OS or hardware crash. But that's terrible. Surely you don't want to tell people: WARNING: Your database is corrupted, or maybe not. But don't worry, I modified the data block so that you won't get this warning again. OK, I guess I'm not sure that you don't want to tell people that. But *I* don't! Well, I would certainly change that to comply with standard message style guidelines. ;-) But the alternatives I've heard so far bother me more. It sounds like the most-often suggested alternative is: ERROR (or stronger?): page checksum failed in relation 999 page 9 DETAIL: This may not actually affect the validity of any tuples, since it could be a flipped bit in the checksum itself or dead space, but we're shutting you down just in case. HINT: You won't be able to read anything on this page, even if it appears to be well-formed, without stopping your database and using some arcane tool you've never heard of before to examine and hand-modify the page. Any query which accesses this table may fail in the same way. The warning level message will be followed by something more severe if the page or a needed tuple is mangled in a way that it would not be used. I guess the biggest risk here is that there is real damage to data which doesn't generate a stronger response, and the users are ignoring warning messages. I'm not sure what to do about that, but the above error doesn't seem like the right solution. Assuming we do something about the torn page on hint-bit only write issue, by moving the hint bits to somewhere else or logging their writes, what would you suggest is the right thing to do when a page is read with a checksum which doesn't match page contents? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Christopher Browne's message of mar dic 20 14:12:56 -0300 2011: It's not evident which problems will be real ones. And in such cases, is the answer to turf the database and recover from backup, because of a single busted page? For a big database, I'm not sure that's less scary than the possibility of one page having a corruption. I don't think the problem is having one page of corruption. The problem is *not knowing* that random pages are corrupted, and living in the fear that they might be. What would you want the server to do when a page with a mismatching checksum is read? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Christopher Browne's message of mar dic 20 14:12:56 -0300 2011: It's not evident which problems will be real ones. And in such cases, is the answer to turf the database and recover from backup, because of a single busted page? For a big database, I'm not sure that's less scary than the possibility of one page having a corruption. I don't think the problem is having one page of corruption. The problem is *not knowing* that random pages are corrupted, and living in the fear that they might be. What would you want the server to do when a page with a mismatching checksum is read? Follow the behaviour of zero_damaged_pages. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Tue, Dec 20, 2011 at 12:38 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I don't think the problem is having one page of corruption. The problem is *not knowing* that random pages are corrupted, and living in the fear that they might be. What would you want the server to do when a page with a mismatching checksum is read? But that's exactly the problem. I don't know what I want the server to do, because I don't know if the page with the checksum mismatch is one of the 10GB of pages in the page cache that were dirty and poses 0 risk (i.e. hint-bit only changes made it dirty), a page that was really messed up on the kernel panic that last happened causing this whole mess, or an even older page that really is giving bitrot... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Andres Freund and...@anarazel.de writes: On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: What would you want the server to do when a page with a mismatching checksum is read? Follow the behaviour of zero_damaged_pages. Surely not. Nobody runs with zero_damaged_pages turned on in production; or at least, nobody with any semblance of a clue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Tuesday, December 20, 2011 07:08:56 PM Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Tuesday, December 20, 2011 06:38:44 PM Kevin Grittner wrote: What would you want the server to do when a page with a mismatching checksum is read? Follow the behaviour of zero_damaged_pages. Surely not. Nobody runs with zero_damaged_pages turned on in production; or at least, nobody with any semblance of a clue. Thats my point. There is no automated solution for page errors. So it should ERROR (not PANIC) out in normal operation and be fixable via zero_damaged_pages. I personally wouldn't even have a problem making zero_damaged_pages only applicable in single backend mode. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggs si...@2ndquadrant.com wrote: The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. I've had another look at this just to make sure. Doing this for 9.2 will change the page format, causing every user to do an unload/reload, with no provided mechanism to do that, whether or not they use this feature. If we do that, the hints are all in the wrong places, meaning any hint set will need to change the CRC. Currently, setting hints can be done while holding a share lock on the buffer. Preventing that would require us to change the way buffer manager works to make it take an exclusive lock while writing out, since a hint would change the CRC and so allowing hints to be set while we write out would cause invalid CRCs. So we would need to hold exclusive lock on buffers while we calculate CRCs. Overall, this will cause a much bigger performance hit than we planned for. But then we have SSI as an option, so why not this? So, do we have enough people in the house that are willing to back this idea, even with a severe performance hit? Are we willing to change the page format now, with plans to change it again in the future? Are we willing to change the page format for a feature many people will need to disable anyway? Do we have people willing to spend time measuring the performance in enough cases to allow educated debate? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
David E. Wheeler da...@justatheory.com writes: holds: it is not a programming language, and one does not need a PL to have a JSON data type. Exactly. That does not contradict the fact that if you have pl/ecmascript you already have JSON. And that we might as well have had the ecmascript PL for some time now, we just need to check about that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Tuesday, December 20, 2011 06:44:48 PM Simon Riggs wrote: Currently, setting hints can be done while holding a share lock on the buffer. Preventing that would require us to change the way buffer manager works to make it take an exclusive lock while writing out, since a hint would change the CRC and so allowing hints to be set while we write out would cause invalid CRCs. So we would need to hold exclusive lock on buffers while we calculate CRCs. While hint bits are a problem that specific problem is actually handled by copying the buffer onto a separate buffer and calculating the CRC on that copy. Given that we already rely on the fact that the flags can be read consistently from the individual backends thats fine. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tuesday, December 20, 2011 07:23:43 PM Dimitri Fontaine wrote: David E. Wheeler da...@justatheory.com writes: holds: it is not a programming language, and one does not need a PL to have a JSON data type. Exactly. That does not contradict the fact that if you have pl/ecmascript you already have JSON. And that we might as well have had the ecmascript PL for some time now, we just need to check about that. Not really. You need to be able to evaluate json without it possibly executing code. Many js implementations are likely to have such a feature though. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote: On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? if b) then this might get a lot more interesting Cheers, Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 2011-12-20 18:44, Simon Riggs wrote: On Mon, Dec 19, 2011 at 11:10 AM, Simon Riggssi...@2ndquadrant.com wrote: The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. I've had another look at this just to make sure. Doing this for 9.2 will change the page format, causing every user to do an unload/reload, with no provided mechanism to do that, whether or not they use this feature. How about only calculating the checksum and setting it in the bgwriter just before flying the buffer off to disk. Perhaps even let autovacuum do the same if it flushes pages to disk as a part of the process. If someone comes along and sets a hint bit,changes data, etc. its only job is to clear the checksum to a meaning telling we dont have a checksum for this page. Unless the bgwriter becomes bottlenecked by doing it, the impact on foreground work should be fairly limited. Jesper .. just throwing in random thoughts .. -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 2011-12-19 02:55, Greg Stark wrote: On Sun, Dec 18, 2011 at 7:51 PM, Jesper Kroghjes...@krogh.cc wrote: I dont know if it would be seen as a half baked feature.. or similar, and I dont know if the hint bit problem is solvable at all, but I could easily imagine checksumming just skipping the hit bit entirely. That was one approach discussed. The problem is that the hint bits are currently in each heap tuple header which means the checksum code would have to know a fair bit about the structure of the page format. Also the closer people looked the more hint bits kept turning up because the coding pattern had been copied to other places (the page header has one, and index pointers have a hint bit indicating that the target tuple is deleted, etc). And to make matters worse skipping individual bits in varying places quickly becomes a big consumer of cpu time since it means injecting logic into each iteration of the checksum loop to mask out the bits. I do know it is a valid and really relevant point (the cpu-time spend), but here in late 2011 it is really a damn irritating limitation, since if there any resources I have plenty available of in the production environment then it is cpu-time, just not on the single core currently serving the client. Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On 20.12.2011 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! I've added the docs (see the attachment) and rebased to current head. Tomas diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index a12a9a2..3635c3f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -691,6 +691,26 @@ postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re /row row + entryliteralfunctionpg_stat_get_db_temp_files/function(typeoid/type)/literal/entry + entrytypebigint/type/entry + entry + Nuber of temporary files written for the database. All temporary files are + counted, regardless of why the temporary file was created (sorting or hash + join) or file size (log_temp_file does not affect this). + /entry + /row + + row + entryliteralfunctionpg_stat_get_db_temp_bytes/function(typeoid/type)/literal/entry + entrytypebigint/type/entry + entry + Amount of data written to temporary files for the database. All temporary + files are counted, regardless of why the temporary file was created (sorting + or hash join) or file size (log_temp_file does not affect this). + /entry + /row + + row entryliteralfunctionpg_stat_get_numscans/function(typeoid/type)/literal/entry entrytypebigint/type/entry entry diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2253ca8..55d20dc 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -574,6 +574,8 @@ CREATE VIEW pg_stat_database AS pg_stat_get_db_tuples_updated(D.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted, pg_stat_get_db_conflict_all(D.oid) AS conflicts, +pg_stat_get_db_temp_files(D.oid) AS temp_files, +pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes, pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset FROM pg_database D; diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 24f4cde..97c7004 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -286,7 +286,7 @@ static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len); - +static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); /* * Public functions called from postmaster follow @@ -1339,6 +1339,29 @@ pgstat_report_recovery_conflict(int reason) pgstat_send(msg, sizeof(msg)); } + +/* + * pgstat_report_tempfile() - + * + * Tell the collector about a temporary file. + * + */ +void +pgstat_report_tempfile(size_t filesize) +{ + PgStat_MsgTempFile msg; + + if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts) + return; + + pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_TEMPFILE); + msg.m_databaseid = MyDatabaseId; + msg.m_filesize = filesize; + pgstat_send(msg, sizeof(msg)); +} + +; + /* -- * pgstat_ping() - * @@ -3185,6 +3208,10 @@ PgstatCollectorMain(int argc, char *argv[]) pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len); break; + case PGSTAT_MTYPE_TEMPFILE: + pgstat_recv_tempfile((PgStat_MsgTempFile *) msg, len); + break; + default: break; } @@ -3266,6 +3293,8 @@ pgstat_get_db_entry(Oid databaseid, bool create) result-n_conflict_snapshot = 0; result-n_conflict_bufferpin = 0; result-n_conflict_startup_deadlock = 0; + result-n_temp_files = 0; + result-n_temp_bytes = 0; result-stat_reset_timestamp = GetCurrentTimestamp(); @@ -4177,6 +4206,8 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len) dbentry-n_tuples_updated = 0; dbentry-n_tuples_deleted = 0; dbentry-last_autovac_time = 0; + dbentry-n_temp_bytes = 0; + dbentry-n_temp_files = 0; dbentry-stat_reset_timestamp = GetCurrentTimestamp(); @@ -4403,6 +4434,24 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len) } /*
[HACKERS] deferrable triggers
In a discussion on irc today, someone had a need to confirm that a business rule (this table has two rows for every related one row in another table) was true at commit time. I innocently suggested a deferrable (and deferred) trigger. It was pointed out that the docs: http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703 say: | This can only be specified for constraint triggers. Hmm. Somehow I had gotten hold of deferrable triggers as a way to solve a problem in moving our replication from our Java framework to PostgreSQL triggers. So we are using a hand-written DEFERRABLE trigger in production, with it apparently working as intended. Is this dangerous? If not, should the docs be updated? This is a potentially valuable feature. For example, to ensure that a double-entry accounting system leaves a financial transaction balanced when a transaction commits. There is no way to check that with EACH ROW triggers, and it can be very clumsy to structure things so that each single statement moves things from one valid state to another. (That *is* one of the jobs of a transaction, after all.) If we remove the above-referenced sentence from the docs, should we include some warnings about the memory needed to track the need to fire these triggers? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Dec 19, 2011 at 5:49 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: My understanding is that JSON is a subset of ECMAscript, so if you get the latter you already have the former. Now, someone would have to check if plscheme still build with guile-2.0, and given that, how exactly you get pl/ecmascript (or pl/js) out of that. I don't think so. I checked it out (still on pgfoundry, still on CVS, and code hasn't been touched since 2008), and run into some issues. - It looks for libguile.h #include libguile.h which, on 2.0, has shifted around from /usr/include/libguile.h (1.8) to /usr/include/guile/2.0/libguile.h It's not doing enough indirections internally; there is a guile-config that is analogous to pg_config postgres@cbbrowne [03:48:43] [~/PostgreSQL/plscheme] - % guile-config compile -pthread -I/usr/include/guile/2.0 postgres@cbbrowne [03:48:45] [~/PostgreSQL/plscheme] - % guile-config link -lguile-2.0 -lgc It looks like there's something PG-related as a next issue: - % ./install.sh pg_config : /var/lib/postgresql/dbs/postgresql-HEAD/bin/pg_config module-dir: /var/lib/postgresql/dbs/postgresql-HEAD/lib max-cache-size: 64 dbname: postgres safe-r5rs : NO dbacreate : NO PSQL : /var/lib/postgresql/dbs/postgresql-HEAD/bin/psql postgres CPPFLAGS : -g -Wall -fpic -c -I/var/lib/postgresql/dbs/postgresql-HEAD/include/server -I/usr/include/guile/2.0 LDFLAGS : -shared -lguile Compiling... failed! plscheme.c: In function '_PG_init': plscheme.c:647:2: warning: implicit declaration of function 'DefineCustomStringVariable' [-Wimplicit-function-declaration] plscheme.c:650:30: error: 'PGC_BACKEND' undeclared (first use in this function) plscheme.c:650:30: note: each undeclared identifier is reported only once for each function it appears in plscheme.c:652:2: warning: implicit declaration of function 'DefineCustomIntVariable' [-Wimplicit-function-declaration] plscheme.c: In function 'plscheme_func_handler': plscheme.c:742:2: warning: implicit declaration of function 'GetTopTransactionId' [-Wimplicit-function-declaration] plscheme.c: In function 'parse_trig_args': plscheme.c:1623:44: error: dereferencing pointer to incomplete type plscheme.c:1628:38: error: dereferencing pointer to incomplete type ... (See error.log for details.) I'm not sure to what degree this is bitrot relating to: a) Postgres changes b) Guile changes but there's doubtless a bit of both. I'd think it interesting to get this back to working order, whether it's useful for JavaScript or not. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deferrable triggers
Kevin Grittner kevin.gritt...@wicourts.gov writes: In a discussion on irc today, someone had a need to confirm that a business rule (this table has two rows for every related one row in another table) was true at commit time. I innocently suggested a deferrable (and deferred) trigger. It was pointed out that the docs: http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703 say: | This can only be specified for constraint triggers. Hmm. Somehow I had gotten hold of deferrable triggers as a way to solve a problem in moving our replication from our Java framework to PostgreSQL triggers. So we are using a hand-written DEFERRABLE trigger in production, with it apparently working as intended. What do you mean by hand-written DEFERRABLE trigger? AFAICS from the grammar, DEFERRABLE and related attributes can only be specified when you write CREATE CONSTRAINT TRIGGER, so the documentation's statement appears correct to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sorting table columns
Alvaro Herrera alvhe...@alvh.no-ip.org writes: I've been trying to implement the holy grail of decoupling logical/physical column sort order representation, i.e., the feature that lets the server have one physical order, for storage compactness, and a different output order that can be tweaked by the user. This has been discussed many times; most recently, I believe, here: http://archives.postgresql.org/pgsql-hackers/2007-02/msg01235.php with implementation details here: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php The idea described there by Tom, and upon which I formed a vague implementation plan in my head, is that I was to look for all uses of an attnum, and then replace it by either attlognum (i.e. the user-visible sort identifier) or attphysnum (i.e. the order of attributes as stored on disk). I thought we'd concluded that we really need three values: attnum should be a permanent logical ID for each column, and then the user-visible column order would be determined by a different number, and the on-disk column order by a third. If we're going to do this at all, it seems like a seriously bad idea to only go halfway, because then we'll just have to revisit all the same code again later. You do *not* want to store either of the latter two numbers in parse-time Var nodes, because then you can't rearrange columns without having to update stored rules. But it might be useful to decree that one thing setrefs.c does is renumber Vars in scan nodes to use the physical column numbers instead of the permanent IDs. I haven't looked into any of the details, but I would guess that targetlists should always be constructed in logical (user-visible) column order. TupleDescs need to match the physical order, most likely. Note that all three orderings are always going to be the same everywhere above the table scan level. (And I suppose COPY will need some hack or other.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sorting table columns
Excerpts from Tom Lane's message of mar dic 20 18:24:29 -0300 2011: Alvaro Herrera alvhe...@alvh.no-ip.org writes: I've been trying to implement the holy grail of decoupling logical/physical column sort order representation, i.e., the feature that lets the server have one physical order, for storage compactness, and a different output order that can be tweaked by the user. This has been discussed many times; most recently, I believe, here: http://archives.postgresql.org/pgsql-hackers/2007-02/msg01235.php with implementation details here: http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php The idea described there by Tom, and upon which I formed a vague implementation plan in my head, is that I was to look for all uses of an attnum, and then replace it by either attlognum (i.e. the user-visible sort identifier) or attphysnum (i.e. the order of attributes as stored on disk). I thought we'd concluded that we really need three values: attnum should be a permanent logical ID for each column, and then the user-visible column order would be determined by a different number, and the on-disk column order by a third. If we're going to do this at all, it seems like a seriously bad idea to only go halfway, because then we'll just have to revisit all the same code again later. Yeah, I was unclear -- that's what I'm doing (or, rather, attempting to do). You do *not* want to store either of the latter two numbers in parse-time Var nodes, because then you can't rearrange columns without having to update stored rules. But it might be useful to decree that one thing setrefs.c does is renumber Vars in scan nodes to use the physical column numbers instead of the permanent IDs. Hmm, having the numbers in Var nodes seems a fundamental part of the way I'm attacking the problem. Hopefully after I give setrefs.c a read I will have a clearer picture of the way to do it without that. I haven't looked into any of the details, but I would guess that targetlists should always be constructed in logical (user-visible) column order. TupleDescs need to match the physical order, most likely. Note that all three orderings are always going to be the same everywhere above the table scan level. (And I suppose COPY will need some hack or other.) Okay. AFAICS this shoots down the idea of modifying destreceivers, which is good because I was coming to that conclusion for a different reason. Thanks for the pointers. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deferrable triggers
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: In a discussion on irc today, someone had a need to confirm that a business rule (this table has two rows for every related one row in another table) was true at commit time. I innocently suggested a deferrable (and deferred) trigger. It was pointed out that the docs: http://www.postgresql.org/docs/9.1/interactive/sql-createtrigger.html#AEN68703 say: | This can only be specified for constraint triggers. Hmm. Somehow I had gotten hold of deferrable triggers as a way to solve a problem in moving our replication from our Java framework to PostgreSQL triggers. So we are using a hand-written DEFERRABLE trigger in production, with it apparently working as intended. What do you mean by hand-written DEFERRABLE trigger? AFAICS from the grammar, DEFERRABLE and related attributes can only be specified when you write CREATE CONSTRAINT TRIGGER, so the documentation's statement appears correct to me. Ah, I had forgotten that I had to use the CONSTRAINT keyword in the trigger definition; the sentence in the docs makes more sense now. I wrote a plpgsql trigger function and created a deferrable initially deferred constraint trigger which referenced it. Is that a reasonable thing to do if you need a commit-time trigger based on some particular action against a particular table? Would it be a reasonable way for the person on irc to enforce the business rule mentioned above? If so, I think there's room for the docs to clarify that CONSTRAINT TRIGGERs are usable for things other than implementing declarative constraints, which was the (apparently contagious) interpretation of the person in irc. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] deferrable triggers
Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: What do you mean by hand-written DEFERRABLE trigger? Ah, I had forgotten that I had to use the CONSTRAINT keyword in the trigger definition; the sentence in the docs makes more sense now. I wrote a plpgsql trigger function and created a deferrable initially deferred constraint trigger which referenced it. Is that a reasonable thing to do if you need a commit-time trigger based on some particular action against a particular table? Would it be a reasonable way for the person on irc to enforce the business rule mentioned above? Sure. The reason we decided to document CREATE CONSTRAINT TRIGGER was exactly that people sometimes need to use it for things besides foreign keys. However, it's not a commit time trigger exactly -- keep in mind that SET CONSTRAINTS can override the trigger's own timing specification. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Marti Raudsepp ma...@juffo.org writes: On Tue, Dec 20, 2011 at 07:08, Tom Lane t...@sss.pgh.pa.us wrote: but I think I don't like this refactoring much. Will take a closer look tomorrow. I was afraid you'd say that, especially for a change that should be backpatched. But I couldn't think of alternative ways to do it that give non-bogus estimates. I've applied a revised version of this patch that factors things in a way I found nicer. The main concrete thing I didn't like about what you'd done was dropping the haveFullScan logic. If we have more than one qual triggering that, we're still going to do one full scan, not multiples of that. It seemed unreasonably hard to get that exactly right when there are multiple array quals each doing such a thing, but I didn't want to let it regress in its handling of multiple plain quals. Also, while looking at this I realized that we had the costing of nestloop cases all wrong. The idea is to scale up the number of tuples (pages) fetched, apply index_pages_fetched(), then scale down again. I think maybe somebody thought that was redundant, but it's not because index_pages_fetched() is nonlinear. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Mon, Dec 19, 2011 at 11:52:54PM -0500, Robert Haas wrote: After staring at this for quite a while longer, it seemed to me that the logic for renaming a relation was similar enough to the logic for changing a schema that the two calbacks could reasonably be combined using a bit of conditional logic; and that, further, the same callback could be used, with a small amount of additional modification, for ALTER TABLE. Here's a patch to do that. Nice. I also notice that cluster() - which doesn't have a callback - has exactly the same needs as ReindexRelation() - which does. So that case can certainly share code; though I'm not quite sure what to call the shared callback, or which file to put it in. RangeVarCallbackForStorageRewrite? I'd put it in tablecmds.c and name it RangeVarCallbackOwnsTable. A few things on the patch: --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -2560,90 +2500,26 @@ CheckTableNotInUse(Relation rel, const char *stmt) * Thanks to the magic of MVCC, an error anywhere along the way rolls back * the whole operation; we don't have to do anything special to clean up. * - * We lock the table as the first action, with an appropriate lock level + * The caller must lock the relation, with an appropriate lock level * for the subcommands requested. Any subcommand that needs to rewrite * tuples in the table forces the whole command to be executed with - * AccessExclusiveLock. If all subcommands do not require rewrite table - * then we may be able to use lower lock levels. We pass the lock level down + * AccessExclusiveLock (actually, that is currently required always, but + * we hope to relax it at some point). We pass the lock level down * so that we can apply it recursively to inherited tables. Note that the - * lock level we want as we recurse may well be higher than required for + * lock level we want as we recurse might well be higher than required for * that specific subcommand. So we pass down the overall lock requirement, * rather than reassess it at lower levels. */ void -AlterTable(AlterTableStmt *stmt) +AlterTable(Oid relid, LOCKMODE lockmode, AlterTableStmt *stmt) { Relationrel; - LOCKMODElockmode = AlterTableGetLockLevel(stmt-cmds); - /* - * Acquire same level of lock as already acquired during parsing. - */ - rel = relation_openrv(stmt-relation, lockmode); + /* Caller is required to provide an adequate lock. */ + rel = relation_open(relid, NoLock); CheckTableNotInUse(rel, ALTER TABLE); - /* Check relation type against type specified in the ALTER command */ - switch (stmt-relkind) - { - case OBJECT_TABLE: - - /* - * For mostly-historical reasons, we allow ALTER TABLE to apply to - * almost all relation types. - */ - if (rel-rd_rel-relkind == RELKIND_COMPOSITE_TYPE - || rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(\%s\ is not a table, - RelationGetRelationName(rel; RangeVarCallbackForAlterRelation() does not preserve ALTER TABLE's refusal to operate on foreign tables. - break; - - case OBJECT_INDEX: - if (rel-rd_rel-relkind != RELKIND_INDEX) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(\%s\ is not an index, - RelationGetRelationName(rel; - break; - - case OBJECT_SEQUENCE: - if (rel-rd_rel-relkind != RELKIND_SEQUENCE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(\%s\ is not a sequence, - RelationGetRelationName(rel; - break; - - case OBJECT_TYPE: - if (rel-rd_rel-relkind != RELKIND_COMPOSITE_TYPE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), - errmsg(\%s\ is not a composite type, - RelationGetRelationName(rel; - break; - - case OBJECT_VIEW: - if (rel-rd_rel-relkind
Re: [HACKERS] sorting table columns
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mar dic 20 18:24:29 -0300 2011: You do *not* want to store either of the latter two numbers in parse-time Var nodes, because then you can't rearrange columns without having to update stored rules. But it might be useful to decree that one thing setrefs.c does is renumber Vars in scan nodes to use the physical column numbers instead of the permanent IDs. Hmm, having the numbers in Var nodes seems a fundamental part of the way I'm attacking the problem. Hopefully after I give setrefs.c a read I will have a clearer picture of the way to do it without that. To clarify a bit: one thing that setrefs.c already does is to renumber Var nodes above the scan level, so that their attnums refer not to original table column attnums but to column numbers in the output of the next plan level down. Vars in scan nodes currently don't need any renumbering, but it'd be easy enough to extend the logic to do something to them as well. I'm visualizing the run-time transformation from physical to logical column ordering as a sort of projection, much like the mapping that happens in a join node. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 9:00 PM, Robert Haas wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. +1 Sounds good to me. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 2:13 AM, Magnus Hagander wrote: Yes, that's why I said few not none. Though in my experience, most companies are a lot more restrictive about addons to their database than addons to their development environments. Yeah, we’re getting off-topic here, so I’ll just say something we can agree on: We’ll see. I do still want to see some processes for getting PGXN distributions into RPM/.deb/StackBuilder, though. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote: Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? Yes. if b) then this might get a lot more interesting JSON is the most popular/likely way to represent that, I think. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why do we need create tuplestore for each fetch?
Thanks for you reply. I found query without cursor is faster then query with server-side cursor and several fetches. But I have a large result set to retrieve from database. I have to choose server-side cursor to avoid out-of-memory problem. When I try to debug the cursor and fetch, I found this unexpected behavior. I think maybe the tuplestore slows the cursor. (maybe I should do some profile later) I want to change the code, but I am afraid there are important reasons for the tuplestore. Therefore, I post it to this list for help: why create tuplestore for each fetch? p.s. a large fetch may turn tuplestore to use buffer file, and slow the performance very much. On Mon, Dec 19, 2011 at 9:06 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 15, 2011 at 8:30 AM, 高增琦 pgf...@gmail.com wrote: I found this several days ago when I try to debug a fetch of cursor. And I have sent a mail to this list, but no one reply... Maybe this is a very simple problem, please help me, thanks a lot... Here is the example: create table t (a int); insert into t values (1),(3),(5),(7),(9); insert into t select a+1 from t; begin; declare c cursor for select * from t order by a; fetch 3 in c; fetch 3 in c; fetch 3 in c; In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT, and then a tuplestore will be created in 'FillPortalStore' in the fetch stmt's portal. In 'FillPortalStore', all result will be store at that tuplestore, Then, go back to 'PortalRun'; next, 'PortalRunSelect' will send this results to client... My problem is: why do we need create that tuplestore as an middle storeage? why do not we just send these result to clent at the first time? Good question. I wouldn't expect it to matter very much for a three-row fetch, but maybe it does for larger ones? What is your motivation for investigating this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com
[HACKERS] CLOG contention
A few weeks ago I posted some performance results showing that increasing NUM_CLOG_BUFFERS was improving pgbench performance. http://archives.postgresql.org/pgsql-hackers/2011-12/msg00095.php I spent some time today looking at this in a bit more detail. Somewhat obviously in retrospect, it turns out that the problem becomes more severe the longer you run the test. CLOG lookups are induced when we go to update a row that we've previously updated. When the test first starts, just after pgbench -i, all the rows are hinted and, even if they weren't, they all have the same XID. So no problem. But, as the fraction of rows that have been updated increases, it becomes progressively more likely that the next update will hit a row that's already been updated. Initially, that's OK, because we can keep all the CLOG pages of interest in the 8 available buffers. But eaten through enough XIDs - specifically, 8 buffers * 8192 bytes/buffer * 4 xids/byte = 256k - we can't keep all the necessary pages in memory at the same time, and so we have to keep replacing CLOG pages. This effect is not difficult to see even on my 2-core laptop, although I'm not sure whether it causes any material performance degradation. If you have enough concurrent tasks, a probably-more-serious form of starvation can occur. As SlruSelectLRUPage notes: /* * We need to wait for I/O. Normal case is that it's dirty and we * must initiate a write, but it's possible that the page is already * write-busy, or in the worst case still read-busy. In those cases * we wait for the existing I/O to complete. */ On Nate Boley's 32-core box, after running pgbench for a few minutes, that in the worst case scenario starts happening quite regularly, apparently because the number of people who simultaneously wish to read a different CLOG pages exceeds the number of available buffers into which they can be read. The ninth and following backends to come along have to wait until the least-recently-used page is no longer read-busy before starting their reads. So, what do we do about this? The obvious answer is increase NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea. 64kB is a pretty small cache on anything other than an embedded system, these days. We could either increase the hard-coded value, or make it configurable - but it would have to be PGC_POSTMASTER, since there's no way to allocate more shared memory later on. The downsides of this approach are: 1. If we make it configurable, nobody will have a clue what value to set. 2. If we just make it bigger, people laboring under the default 32MB shared memory limit will conceivably suffer even more than they do now if they just initdb and go. A more radical approach would be to try to merge the buffer arenas for the various SLRUs either with each other or with shared_buffers, which would presumably allow a lot more flexibility to ratchet the number of CLOG buffers up or down depending on overall memory pressure. Merging the buffer arenas into shared_buffers seems like the most flexible solution, but it also seems like a big, complex, error-prone behavior change, because the SLRU machinery does things quite differently from shared_buffers: we look up buffers with a linear array search rather than a hash table probe; we have only a per-SLRU lock and a per-page lock, rather than separate mapping locks, content locks, io-in-progress locks, and pins; and while the main buffer manager is content with some loosey-goosey approximation of recency, the SLRU code makes a fervent attempt at strict LRU (slightly compromised for the sake of reduced locking in SimpleLruReadPage_Readonly). Any thoughts on what makes most sense here? I find it fairly tempting to just crank up NUM_CLOG_BUFFERS and call it good, but the siren song of refactoring is whispering in my other ear. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why do we need create tuplestore for each fetch?
=?UTF-8?B?6auY5aKe55Cm?= pgf...@gmail.com writes: Here is the example: create table t (a int); insert into t values (1),(3),(5),(7),(9); insert into t select a+1 from t; begin; declare c cursor for select * from t order by a; fetch 3 in c; fetch 3 in c; fetch 3 in c; In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT, and then a tuplestore will be created in 'FillPortalStore' in the fetch stmt's portal. How are you trying to do the fetches, PQexec(fetch 3 in c) ? That is an inherently inefficient way to do things, and trying to shave a few cycles off the intermediate tuplestore isn't going to fix that. The general overhead of parsing a new SQL command is probably going to swamp the costs of a tuplestore, especially if it's too small to spill to disk (and if it isn't, you really do need the tuplestore mechanism, slow or not). If you want to get a speed improvement there would probably be a lot more bang for the buck in extending libpq to support protocol-level portal access. It does already have PQdescribePortal, but for some reason not anything for fetch N rows from portal so-and-so. Not sure whether it's worth providing explicit portal open/close commands separate from PQexec'ing DECLARE CURSOR and CLOSE, but maybe at the margins those steps would be worth improving too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Robert Haas robertmh...@gmail.com writes: So, what do we do about this? The obvious answer is increase NUM_CLOG_BUFFERS, and I'm not sure that's a bad idea. As you say, that's likely to hurt people running in small shared memory. I too have thought about merging the SLRU areas into the main shared buffer arena, and likewise have concluded that it is likely to be way more painful than it's worth. What I think might be an appropriate compromise is something similar to what we did for autotuning wal_buffers: use a fixed percentage of shared_buffers, with some minimum and maximum limits to ensure sanity. But picking the appropriate percentage would take a bit of research. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
Robert Haas robertmh...@gmail.com writes: ... while the main buffer manager is content with some loosey-goosey approximation of recency, the SLRU code makes a fervent attempt at strict LRU (slightly compromised for the sake of reduced locking in SimpleLruReadPage_Readonly). Oh btw, I haven't looked at that code recently, but I have a nasty feeling that there are parts of it that assume that the number of buffers it is managing is fairly small. Cranking up the number might require more work than just changing the value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 12/19/2011 06:14 PM, Kevin Grittner wrote: But if you need all that infrastructure just to get the feature launched, that's a bit hard to stomach. Triggering a vacuum or some hypothetical scrubbing feature? What you were suggesting doesn't require triggering just a vacuum though--it requires triggering some number of vacuums, for all impacted relations. You said yourself that all tables if the there's no way to rule any of them out was a possibility. I'm just pointing out that scheduling that level of work is a logistics headache, and it would be reasonable for people to expect some help with that were it to become a necessary thing falling out of the implementation. Some people think I border on the paranoid on this issue. Those people are also out to get you, just like the hardware. Are you arguing that autovacuum should be disabled after crash recovery? I guess if you are arguing that a database VACUUM might destroy recoverable data when hardware starts to fail, I can't argue. A CRC failure suggests to me a significantly higher possibility of hardware likely to lead to more corruption than a normal crash does though. The main way I expect to validate this sort of thing is with an as yet unwritten function to grab information about a data block from a standby server for this purpose, something like this: Master: Computed CRC A, Stored CRC B; error raised because A!=B Standby: Computed CRC C, Stored CRC D If C==D A==C, the corruption is probably overwritten bits of the CRC B. Are you arguing we need *that* infrastructure to get the feature launched? No; just pointing out the things I'd eventually expect people to want, because they help answer questions about what to do when CRC failures occur. The most reasonable answer to what should I do about suspected corruption on a page? in most of the production situations I worry about is see if it's recoverable from the standby. I see this as being similar to how RAID-1 works: if you find garbage on one drive, and you can get a clean copy of the block from the other one, use that to recover the missing data. If you don't have that capability, you're stuck with no clear path forward when a CRC failure happens, as you noted downthread. This obviously gets troublesome if you've recently written a page out, so there's some concern about whether you are checking against the correct version of the page or not, based on where the standby's replay is at. I see that as being a case that's also possible to recover from though, because then the page you're trying to validate on the master is likely sitting in the recent WAL stream. This is already the sort of thing companies doing database recovery work (of which we are one) deal with, and I doubt any proposal will cover every possible situation. In some cases there may be no better answer than show all the known versions and ask the user to sort it out. The method I suggested would sometimes kick out an automatic fix. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)
Hi! Studying this question little more I found that current approach of range indexing can be dramatically inefficient in some cases. It's not because of penalty or split implementation, but because of approach itself. Mapping intervals to two-dimensional space produce much better results in case of high-overlapping ranges and @, @ operators with low selectivity. There is a simple test case for proof of concept. create table source as (select l, (l + s) as r from (select (random()*1)::int as l, (random()*1000 + 1)::int s from generate_series(1,100) g) x); create table range_test as (select int4range(l,r) as x from source); create table point_test as (select point(l,r) as x from source); create index range_test_idx on range_test using gist (x); create index point_test_idx on point_test using gist (x); test=# explain (analyze, buffers) select * from range_test where x @ int4range(5000,5010); QUERY PLAN - Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32) (actual time=37.304..37.310 rows=2 loops=1) Recheck Cond: (x @ '[5000,5010)'::int4range) Buffers: shared hit=767 - Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000 width=0) (actual time=37.288..37.288 rows=2 loops=1) Index Cond: (x @ '[5000,5010)'::int4range) Buffers: shared hit=765 Total runtime: 37.385 ms (7 rows) test=# explain (analyze, buffers) select * from point_test where x @ box(point(5000,5000),point(5010,5010)); QUERY PLAN --- Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.197..0.206 rows=2 loops=1) Recheck Cond: (x @ '(5010,5010),(5000,5000)'::box) Buffers: shared hit=5 - Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000 width=0) (actual time=0.182..0.182 rows=2 loops=1) Index Cond: (x @ '(5010,5010),(5000,5000)'::box) Buffers: shared hit=3 Total runtime: 0.265 ms (7 rows) test=# explain (analyze, buffers) select * from range_test where x @ int4range(5000,5990); QUERY PLAN --- Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32) (actual time=4.578..4.603 rows=5 loops=1) Recheck Cond: (x @ '[5000,5990)'::int4range) Buffers: shared hit=52 - Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000 width=0) (actual time=4.561..4.561 rows=5 loops=1) Index Cond: (x @ '[5000,5990)'::int4range) Buffers: shared hit=47 Total runtime: 4.669 ms (7 rows) test=# explain (analyze, buffers) select * from point_test where x @ box(point('-inf'::float,5990),point(5000,'+inf'::float)); QUERY PLAN --- Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16) (actual time=0.328..0.353 rows=5 loops=1) Recheck Cond: (x @ '(5000,inf),(-inf,5990)'::box) Buffers: shared hit=8 - Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000 width=0) (actual time=0.312..0.312 rows=5 loops=1) Index Cond: (x @ '(5000,inf),(-inf,5990)'::box) Buffers: shared hit=3 Total runtime: 0.419 ms (7 rows) If you like to learn more information about such mapping you can start from here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf Any thoughts? - With best regards, Alexander Korotkov.
Re: [HACKERS] JSON for PG 9.2
On Tue, Dec 20, 2011 at 00:26, David E. Wheeler da...@kineticode.com wrote: On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote: We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN. Yes, that's why I said few not none. Though in my experience, most companies are a lot more restrictive about addons to their database than addons to their development environments. And note that it's not PGXN that's the problem I'm pointing at, neither is it CPAN or easy_install or gem. The problem is the vulnerability of the addon, and the maintenance. Meaning if it has a single maintainer, that's a whole different thing from being maintained by the PGDG. Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from. I think we're talking about different things here. While we can certainly provide support on specific modules, after that is entered into the agreement with the customer, we won't support a customer who just calls up and says hey, I'm using module xyz which you've never heard of, and it crashes my database, please come fix it now. Are you saying you do that - providing SLAs, 24/7 and similar things, on modules you didn't even know the customer was using? And FWIW, I'm talking about the quality, and not the ecosystem as well. I'm just saying it takes a lot more work to verify the quality and maintenance of an external module - if it's part of postgresql, you have *already* got a quality stamp and a maintenance promise from that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
Would this be alleviated by setting stats_temp_dir to point to a ramdisk? I am not aware how to do this. I am using a windows server OS. The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp' What do I change it to? Please elucidate. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5088497.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Tue, Dec 20, 2011 at 06:00, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. +1 for getting the basics in first, and then adding more to it later. There's still a fair amount of time to do that for 9.2, but not if we get stuck bikeshedding again... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
2011/12/20 Tomas Vondra t...@fuzzy.cz: Hello everybody, this patch adds two counters to pg_stat_database to track temporary files - number of temp files and number of bytes. I see this as a useful feature, as temporary files often cause a lot of IO (because of low work_mem etc.). The log_temp_files is useful, but you have to parse the log and only temp files exceeding a size limit are logged so the actual amount of I/O is unknown. Hey, cool, that was on my personal TODO list :-) The patch is rather simple: 1) two new fields in PgStat_StatDBEntry (n_temp_files, n_temp_bytes) 2) report/recv function in pgstat.c 3) FileClose modified to log stats for all temp files (see below) 4) two new fields added to pg_stat_database (temp_files, temp_bytes) I haven't reviewed the code itself yet, but that seems like a reasonable approach. I had to modify FileClose to call stat() on each temp file as this should log all temp files (not just when log_temp_file = 0). But the impact of this change should be negligible, considering that the user is already using temp files. I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog location arithmetic
On Tue, Dec 6, 2011 at 19:06, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 6, 2011 at 1:00 PM, Euler Taveira de Oliveira eu...@timbira.com wrote: On 06-12-2011 13:11, Robert Haas wrote: On Tue, Dec 6, 2011 at 5:14 AM, Magnus Hagander mag...@hagander.net wrote: I've been considering similar things, as you can find in the archives, but what I was thinking of was converting the number to just a plain bigint, then letting the user apply whatever arithmetic wanted at the SQL level. I never got around to acutally coding it, though. It could easily be extracted from your patch of course - and I think that's a more flexible approach. Is there some advantage to your method that I'm missing? I went so far as to put together an lsn data type. I didn't actually get all that far with it, which is why I haven't posted it sooner, but here's what I came up with. It's missing indexing support and stuff, but that could be added if people like the approach. It solves this problem by implementing -(lsn,lsn) = numeric (not int8, that can overflow since it is not unsigned), which allows an lsn = numeric conversion by just subtracting '0/0'::lsn. Interesting approach. I don't want to go that far. If so, you want to change all of those functions that deal with LSNs and add some implicit conversion between text and lsn data types (for backward compatibility). As of int8, I'm As long as you have the conversion, you don't really need to change them, do you? It might be nice in some ways, but this is still a pretty internal operation, so I don't see it as critical. not aware of any modern plataform that int8 is not 64 bits. I'm not against numeric use; I'm just saying that int8 is sufficient. The point isn't that int8 might not be 64 bits - of course it has to be 64 bits; that's why it's called int8 i.e. 8 bytes. The point is that a large enough LSN, represented as an int8, will come out as a negative values. int8 can only represent 2^63 *non-negative* values, because one bit is reserved for sign. Doing it in numeric should be perfectly fine. The only real reason to pick int8 over in this context would be performance, but it's not like this is something that's going to be called in really performance critical paths... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On 20 Prosinec 2011, 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! Hm, I added it to the current commit fest - I should probably move it to the next one (2012-01), right? I'll add the docs at the evening. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: tracking temp files in pg_stat_database
On Tue, Dec 20, 2011 at 11:45, Tomas Vondra t...@fuzzy.cz wrote: On 20 Prosinec 2011, 11:20, Magnus Hagander wrote: 2011/12/20 Tomas Vondra t...@fuzzy.cz: I haven't updated the docs yet - let's see if the patch is acceptable at all first. Again, without having reviewed the code, this looks like a feature we'd want, so please add some docs, and then submit it for the next commitfest! Hm, I added it to the current commit fest - I should probably move it to the next one (2012-01), right? I'll add the docs at the evening. Yes, all new patches should always go on the one that's labeled Open. If we don't do it that way, we will never finish a CF... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collect frequency statistics for arrays
Hi! On Wed, Nov 16, 2011 at 1:43 AM, Nathan Boley npbo...@gmail.com wrote: FYI, I've added myself as the reviewer for the current commitfest. How is going review now? -- With best regards, Alexander Korotkov.
[HACKERS] Real-life range datasets
Hackers, For better GiST indexing of range types it's important to have real-life datasets for testing on. Real-life range datasets would help to proof (or reject) some concepts and get more realistic benchmarks. Also, it would be nice to know what queries you expect to run fast on that datasets. Ideally it should be real-life set of queries, but it also could be your presentation of what are typical queries for such datasets. Thanks! - With best regards, Alexander Korotkov.
Re: [HACKERS] xlog location arithmetic
On 20-12-2011 07:27, Magnus Hagander wrote: On Tue, Dec 6, 2011 at 19:06, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 6, 2011 at 1:00 PM, Euler Taveira de Oliveira eu...@timbira.com wrote: On 06-12-2011 13:11, Robert Haas wrote: On Tue, Dec 6, 2011 at 5:14 AM, Magnus Hagander mag...@hagander.net wrote: I've been considering similar things, as you can find in the archives, but what I was thinking of was converting the number to just a plain bigint, then letting the user apply whatever arithmetic wanted at the SQL level. I never got around to acutally coding it, though. It could easily be extracted from your patch of course - and I think that's a more flexible approach. Is there some advantage to your method that I'm missing? I went so far as to put together an lsn data type. I didn't actually get all that far with it, which is why I haven't posted it sooner, but here's what I came up with. It's missing indexing support and stuff, but that could be added if people like the approach. It solves this problem by implementing -(lsn,lsn) = numeric (not int8, that can overflow since it is not unsigned), which allows an lsn = numeric conversion by just subtracting '0/0'::lsn. Interesting approach. I don't want to go that far. If so, you want to change all of those functions that deal with LSNs and add some implicit conversion between text and lsn data types (for backward compatibility). As of int8, I'm As long as you have the conversion, you don't really need to change them, do you? It might be nice in some ways, but this is still a pretty internal operation, so I don't see it as critical. For correctness, yes. At this point, my question is: do we want to support the lsn data type idea or a basic function that implements the difference between LSNs? -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On Mon, Dec 19, 2011 at 15:50, Greg Smith g...@2ndquadrant.com wrote: On 12/18/2011 07:31 AM, Magnus Hagander wrote: * I restructured the if statements, because I had a hard time following the comments around that ;) I find this one easier - but I'm happy to change back if you think your version was more readable. That looks fine. I highlighted this because I had a feeling there was still some gain to be had here, just didn't see it myself. This works. * The error message in pg_signal_backend breaks the abstraction, because it specifically talks about terminating the other backend - when it's not supposed to know about that in that function. I think we either need to get rid of the hint completely, or we need to find a way to issue it from the caller. Or pass it as a parameter. It's fine for now since we only have two signals, but we might have more in the future.. I feel that including a hint in the pg_terminate_backend case is a UI requirement. If someone has made it as far as discovering that function exists, tries calling it, and it fails, the friendly thing to do is point them toward a direction that might work better. Little things like that make a huge difference in how friendly the software appears to its users; this is even more true in cases where version improvements actually expand what can and cannot be done. My quick and dirty side thinks that just documenting the potential future issues would be enough: Due to the limited number of callers of this function, the hint message here can be certain that pg_terminate_backend provides the only path to reach this point. If more callers to pg_signal_backend appear, a more generic hint mechanism might be needed here. If you must have this more generic mechanism available, I would accept re-factoring to provide it instead. What I wouldn't want to live with is a commit of this where the hint goes away completely. It's taken a long time chopping the specification to get this feature sorted out; we might as well make what's left be the best it can be now. How about something like this - passing it in as a parameter? That said - can someone who knows the translation stuff better than me comment on if this is actually going to be translatable, or if it violates too many translation rules? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e7f7fe0..cf77586 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14244,8 +14244,8 @@ SELECT set_config('log_statement_stats', 'off', false); para The functions shown in xref linkend=functions-admin-signal-table send control signals to -other server processes. Use of these functions is restricted -to superusers. +other server processes. Use of these functions is usually restricted +to superusers, with noted exceptions. /para table id=functions-admin-signal-table @@ -14262,7 +14262,10 @@ SELECT set_config('log_statement_stats', 'off', false); literalfunctionpg_cancel_backend(parameterpid/parameter typeint/)/function/literal /entry entrytypeboolean/type/entry - entryCancel a backend's current query/entry + entryCancel a backend's current query. You can execute this against +another backend that has exactly the same role as the user calling the +function. In all other cases, you must be a superuser. +/entry /row row entry @@ -14304,6 +14307,10 @@ SELECT set_config('log_statement_stats', 'off', false); commandpostgres/command processes on the server (using applicationps/ on Unix or the applicationTask Manager/ on productnameWindows/). +For the less restrictive functionpg_cancel_backend/, the role of an +active backend can be found from +the structfieldusename/structfield column of the +structnamepg_stat_activity/structname view. /para para diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 7a2e0c8..45520b6 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -30,6 +30,7 @@ #include postmaster/syslogger.h #include storage/fd.h #include storage/pmsignal.h +#include storage/proc.h #include storage/procarray.h #include tcop/tcopprot.h #include utils/builtins.h @@ -70,15 +71,45 @@ current_query(PG_FUNCTION_ARGS) } /* - * Functions to send signals to other backends. + * Send a signal to another backend. + * If allow_same_role is false, actionstr must be set to a string + * indicating what the signal does, to be inserted in the error message, and + * hint should be set to a hint to be sent along with this message. */ static bool -pg_signal_backend(int pid, int sig) +pg_signal_backend(int pid, int sig, bool allow_same_role, const char *actionstr, const char *hint) { + PGPROC *proc; + if (!superuser()) - ereport(ERROR, -
Re: [HACKERS] Review: Non-inheritable check constraints
On Tue, Dec 20, 2011 at 1:14 AM, Nikhil Sontakke nikkh...@gmail.com wrote: Agreed. I just tried out the scenarios laid out by you both with and without the committed patch and AFAICS, normal inheritance semantics have been preserved properly even after the commit. No, they haven't. I didn't expect this to break anything when you have two constraints with different names. The problem is when you have two constraints with the same name. Testing reveals that this is, in fact, broken: rhaas=# create table A(ff1 int); CREATE TABLE rhaas=# create table B () inherits (A); CREATE TABLE rhaas=# create table C () inherits (B); CREATE TABLE rhaas=# alter table only b add constraint chk check (ff1 0); ALTER TABLE rhaas=# alter table a add constraint chk check (ff1 0); NOTICE: merging constraint chk with inherited definition ALTER TABLE At this point, you'll find that a has a constraint, and b has a constraint, but *c does not have a constraint*. That's bad, because a's constraint wasn't only and should therefore have propagated all the way down the tree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sync_seqscans in postgresql.conf
Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On 12/20/2011 05:13 AM, pratikchirania wrote: Would this be alleviated by setting stats_temp_dir to point to a ramdisk? I am not aware how to do this. I am using a windows server OS. The conf file has the entry : #stats_temp_directory = 'pg_stat_tmp' What do I change it to? Please elucidate. On Windows it appears you need third party software for a ramdisk. Search google for info. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 2:44 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I was thinking that we would warn when such was found, set hint bits as needed, and rewrite with the new CRC. In the unlikely event that it was a torn hint-bit-only page update, it would be a warning about something which is a benign side-effect of the OS or hardware crash. But that's terrible. Surely you don't want to tell people: WARNING: Your database is corrupted, or maybe not. But don't worry, I modified the data block so that you won't get this warning again. OK, I guess I'm not sure that you don't want to tell people that. But *I* don't! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Replication timeout units
from postgresql.conf.sample: #replication_timeout = 60s # in milliseconds; 0 disables Seconds or milliseconds? I would suggest we just remove the in milliseconds, and instead say timeout for replication connections; 0 disables. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pause at end of recovery
These days we have pause_at_recovery_target, which lets us pause when we reach a PITR target. Is there a particular reason we don't have a way to pause at end of recovery if we *didn't* specify a target - meaning we let it run until the end of the archived log? While it's too late to change the target, I can see a lot of usescases where you don't want it to be possible to make changes to the database again until it has been properly verified - and keeping it up in readonly mode in that case can be quite useful... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 14:47, k...@rice.edu k...@rice.edu wrote: On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Yes, but they may also come out in a different order if you run the same query again 5 minutes later... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
On Tue, Dec 20, 2011 at 02:54:32PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:47, k...@rice.edu k...@rice.edu wrote: On Tue, Dec 20, 2011 at 02:41:54PM +0100, Magnus Hagander wrote: On Tue, Dec 20, 2011 at 14:38, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 20, 2011 at 8:35 AM, Magnus Hagander mag...@hagander.net wrote: Is there any reason why the setting synchronize_seqscans is in the section version/platform compatibility in postgresql.conf? Is it just because nobody could find a better place for it? ;) It seems a bit wrong to me... Presumably the thought was that you would turn it off to restore that existed in older versions of PostgreSQL. Doesn't seem much different from default_with_oids or lo_compat_privileges. Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Yes, but they may also come out in a different order if you run the same query again 5 minutes later... If the sequential scans always start at the beginning of the table, which was true before the sync-ed scans ability, the order is basically fixed for a large set of queries if you do not modify the data. With sync-ed scans, every repetition of the query will depend on where the scan starts in the data set. At least that is what I remember happening during the original testing of that feature, which is a great feature. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
Agreed. I just tried out the scenarios laid out by you both with and without the committed patch and AFAICS, normal inheritance semantics have been preserved properly even after the commit. No, they haven't. I didn't expect this to break anything when you have two constraints with different names. The problem is when you have two constraints with the same name. Testing reveals that this is, in fact, broken: rhaas=# create table A(ff1 int); CREATE TABLE rhaas=# create table B () inherits (A); CREATE TABLE rhaas=# create table C () inherits (B); CREATE TABLE rhaas=# alter table only b add constraint chk check (ff1 0); ALTER TABLE rhaas=# alter table a add constraint chk check (ff1 0); NOTICE: merging constraint chk with inherited definition ALTER TABLE At this point, you'll find that a has a constraint, and b has a constraint, but *c does not have a constraint*. That's bad, because a's constraint wasn't only and should therefore have propagated all the way down the tree. Apologies, I did not check this particular scenario. I guess, here, we should not allow merging of the inherited constraint into an only constraint. Because that breaks the semantics for only constraints. If this sounds ok, I can whip up a patch for the same. Regards, Nikhils
[HACKERS] Extensions and 9.2
Hi, I've sent a first patch to improve extensions for 9.2, and intend on sending a few more which I'll briefly present here. The point of this email is to figure out how to branch the development, as all the patch are going to conflict somehow (change the same parts of the code). Either I develop them separately, with separate branches derived from the master one, or I develop them as a stack, one on top of the other. The difference is my ability to provide a patch for one of the features that can be applied to master directly compared to how much time I have to spend cooking one patch or the other (merge conflicts, etc). If we are going to try and commit all of those for 9.2, then I can stack them all atop of each other and have an easier development time. Here's the list: - extension features (requires / provides) as already sent, allows fine grained dependency management - SQL only extensions the goal here is to be able to install an SQL only extension without having to be granted OS shell access on the PostgreSQL server, or other arrangement allowing you to ship files (.control, .sql) in a place where usually only “root” has write access. meaning that the control file property that says “superuser = false” can be true for the distribution of extension too. - extension modules the goal here is to be able to list all the modules that are loaded by an extension — the install script will install all functions and will be loading all related .so, it's easy enough to keep track of them at creating_extension time and “register” that module list. that helps with systems auditing when you're called to understand a crash after the fact. Of course even better would be to only allow loading modules that as part of extensions, and to be able to list implemented hooks (and which extension is implementing which hook), but that would/could be some follow-up patches. - extension whitelisting the goal here is to grant non superuser to install extensions from a restricted list, introducing a specific “sudo” like behavior when the extension is implemented in C or some other non trusted language. that could be easily done with the current command trigger patch and a trigger procedure that is security definer, and doesn't need the parsetree at all, but that could easily drift away from 9.2, so maybe a specific implementation would be better here Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
rhaas=# create table A(ff1 int); CREATE TABLE rhaas=# create table B () inherits (A); CREATE TABLE rhaas=# create table C () inherits (B); CREATE TABLE rhaas=# alter table only b add constraint chk check (ff1 0); ALTER TABLE rhaas=# alter table a add constraint chk check (ff1 0); NOTICE: merging constraint chk with inherited definition ALTER TABLE At this point, you'll find that a has a constraint, and b has a constraint, but *c does not have a constraint*. That's bad, because a's constraint wasn't only and should therefore have propagated all the way down the tree. Apologies, I did not check this particular scenario. I guess, here, we should not allow merging of the inherited constraint into an only constraint. Because that breaks the semantics for only constraints. If this sounds ok, I can whip up a patch for the same. PFA, patch which does just this. postgres=# alter table a add constraint chk check (ff1 0); ERROR: constraint chk for relation b is an ONLY constraint. Cannot merge Regards, Nikhils only_constraint_no_merge.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sync_seqscans in postgresql.conf
k...@rice.edu k...@rice.edu writes: On Tue, Dec 20, 2011 at 02:54:32PM +0100, Magnus Hagander wrote: Seems very different to me - those change *what* happens when you do certain things. sync_seqscans is just a performance tuning option, no? It doesn't actually change the semantics of any operations... In a query without enforced orders, the returned rows will come out in a possibly different order each time the query runs. I know it is bad coding to depend on things like that, but it is out there... So in those cases it is not just semantics. Right. It *is* query semantics for people who are depending on getting the same row order each time they read an unchanging table. Yeah, the SQL standard implies that that's not guaranteed, but in all PG versions before we added syncscan, it did work that way, and some people need it to continue to work that way (it's worth reflecting that syncscan would break most or all of the regression tests if it had a smaller granularity). So it is a backwards-compatibility option. Which is not to say that I like the current GUC classification in general, but this particular one isn't out of place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication timeout units
On 20 December 2011 13:38, Magnus Hagander mag...@hagander.net wrote: from postgresql.conf.sample: #replication_timeout = 60s # in milliseconds; 0 disables Seconds or milliseconds? I would suggest we just remove the in milliseconds, and instead say timeout for replication connections; 0 disables. +1 from me. That's very confusing. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
On Tue, Dec 20, 2011 at 07:08, Tom Lane t...@sss.pgh.pa.us wrote: it'd likely be better if this code ignored unrecognized qual expression types rather than Assert'ing they're not there. The patch replaced that Assert with an elog(ERROR) Hmm. I am reminded of how utterly unreadable diff -u format is for anything longer than single-line changes :-( ... Sorry, the new patch is in context (-C) diff format proper. I also moved around code a bit and removed an unused variable that was left around from the refactoring. but I think I don't like this refactoring much. Will take a closer look tomorrow. I was afraid you'd say that, especially for a change that should be backpatched. But I couldn't think of alternative ways to do it that give non-bogus estimates. While writing this patch, the largest dilemma was where to account for the multiple array scans. Given that this code is mostly a heuristic and I lack a deep understanding of GIN indexes, it's likely that I got this part wrong. Currently I'm doing this: partialEntriesInQuals *= array_scans; exactEntriesInQuals *= array_scans; searchEntriesInQuals *= array_scans; Which seems to be the right thing as far as random disk accesses are concerned (successive scans are more likely to hit the cache) and also works well with queries that don't touch most of the index. But this fails spectacularly when multiple full scans are performed e.g. LIKE ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of the rescan costs. Another approach is multiplying the total cost from the number of scans. This overestimates random accesses from rescans, but fixes the above case: *indexTotalCost = (*indexStartupCost + dataPagesFetched * spc_random_page_cost) * array_scans; Regards, Marti diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out new file mode 100644 index e7af7d4..250d853 *** a/contrib/pg_trgm/expected/pg_trgm.out --- b/contrib/pg_trgm/expected/pg_trgm.out *** explain (costs off) *** 3486,3491 --- 3486,3501 Index Cond: (t ~~* '%BCD%'::text) (4 rows) + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); +QUERY PLAN + - + Bitmap Heap Scan on test2 +Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +- Bitmap Index Scan on test2_idx_gin + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + (4 rows) + select * from test2 where t like '%BCD%'; t --- *** select * from test2 where t ilike 'qua%' *** 3509,3514 --- 3519,3531 quark (1 row) + select * from test2 where t like any ('{%bcd%,qua%}'); +t + + abcdef + quark + (2 rows) + drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; *** explain (costs off) *** 3528,3533 --- 3545,3560 Index Cond: (t ~~* '%BCD%'::text) (2 rows) + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); +QUERY PLAN + - + Bitmap Heap Scan on test2 +Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +- Bitmap Index Scan on test2_idx_gist + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + (4 rows) + select * from test2 where t like '%BCD%'; t --- *** select * from test2 where t ilike 'qua%' *** 3551,3553 --- 3578,3587 quark (1 row) + select * from test2 where t like any ('{%bcd%,qua%}'); +t + + abcdef + quark + (2 rows) + diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql new file mode 100644 index ea902f6..ac969e6 *** a/contrib/pg_trgm/sql/pg_trgm.sql --- b/contrib/pg_trgm/sql/pg_trgm.sql *** explain (costs off) *** 47,56 --- 47,59 select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; + select * from test2 where t like any ('{%bcd%,qua%}'); drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; *** explain (costs off) *** 58,64 --- 61,70 select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; + explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%';