Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Josh Berkus j...@agliodbs.com writes: I thought that Dimitri had already implemented this using Fincore. It's linux-only, but that should work well enough to test the general concept. Actually, Cédric did, and I have a clone of his repository where I did some debian packaging of it. http://villemain.org/projects/pgfincore http://git.postgresql.org/gitweb?p=pgfincore.git;a=summary http://git.postgresql.org/gitweb?p=pgfincore.git;a=tree Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
2011/5/4 Josh Berkus j...@agliodbs.com: All, I thought that Dimitri had already implemented this using Fincore. It's linux-only, but that should work well enough to test the general concept. Harald provided me some pointers at pgday in Stuttgart to make it work with windows but ... hum I have not windows and wasn't enought motivated to make it work on it if no one need it. I didn't search recently on the different kernels, but any kernel supporting mincore and posix_fadvise should work. (so probably the same set of kernel that support our 'effective_io_concurrency'). Still waiting for (free)BSD support . -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pull up aggregate subquery
2011/5/5 Tom Lane t...@sss.pgh.pa.us: Hitoshi Harada umi.tan...@gmail.com writes: I sometimes wonder if we could pull up aggregate query in the optimizer. I don't have time to look at this right now, but please add to the upcoming commitfest. Done. https://commitfest.postgresql.org/action/patch_view?id=548 I'll work further if I find time. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
Hi, In my opinion this is actually a bug in 9.0. As its a (imo) low impact fix thats constrained to two files it seems sensible to backpatch it now that the solution has proven itself in the field? The issue is hard to find and has come up several times in the field. And it has been slightly embarassing more than once ;) I am happy to provide patches for the supported releases 9.0 if that helps the issue. Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Hi, thanks for good suggestions. Postgres usually starts with ZERO buffer cache. By saving the buffer cache data structure into hibernation files just before shutdown, and loading them at startup, postgres can start operations with the saved buffer cache as the same condition as just before the last shutdown. This seems like a lot of complication for rather dubious gain. What happens when the DBA changes the shared_buffers setting, for instance? It was my first concern actually. Current implementation is stopping reading hibernation file when detecting the size mismatch among shared_buffers and hibernation file. I think it is a safety way. As Alvaro Herrera mentioned, it would be possible to adjust copying buffer bloks, but changing shared_buffers setting is not so often I think. How do you protect against the cached buffers getting out-of-sync with the actual disk files (especially during recovery scenarios)? What Saving DB buffer cahce is called at shutdown after finishing bgwriter's final checkpoint process, so dirty-buffers should not exist I believe. For recovery scenarios, I need to research it though... Could you describe what is need to be consider? about crash-induced corruption in the cache file itself (consider the not-unlikely possibility that init will kill the database before it's had time to dump all the buffers during a system shutdown)? Do you have I think this is important point. I'll implement validation function for hibernation file. any proof that writing out a few GB of buffers and then reading them back in is actually much cheaper than letting the database re-read the data from the disk files? I think this means sequential-read vs scattered-read. The largest hibernation file is for buffer blocks, and sequential-read from it would be much faster than scattered-read from database file via smgrread() block by block. As Greg Stark suggested, re-reading from database file based on buffer descriptors was one of implementation candidates (it can reduce storage consumption for hibernation), but I chose creating buffer blocks raw image file and reading it for the performance. Thanks -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
Hi, I think that PgFincore (http://pgfoundry.org/projects/pgfincore/) provides similar functionality. Are you familiar with that? If so, could you contrast your approach with that one? I'm not familiar with PgFincore at all sorry, but I got source code and documents and read through them just now. # and I'm a novice on postgres actually... The target both is to reduce physical I/O, but their approaches and gains are different. My understanding is like this; +-+ +-+ | Postgres(backend) | | Postgres| | +-+ | | | | | DB Buffer Cache | | | | | | (shared buffers)| | | | | |*my target | | | | | +-+ | | | | ^ ^ | | | | | | | | | | v v | | | | +-+ | | +-+ | | | buffer manager | | | |pgfincore| | | +-+ | | +-+ | +---^--^--+ +--^--+ | |smgrread() |posix_fadvise() |read()| | userland == | | | kernel | +-+-+ || |v | ++ | | File System| | | +-+ | +--| | FS Buffer Cache | | | |*PgFincore target| | | +-+ | |^ ^ | +|---|---+ | | == | | hardware +-|---|+ | | v Physical Disk | | | +--+ | | | | base/16384/24598 | | | v +--+ | | +--+ | | |Buffer Cache Hibernation Files| | | +--+ | +--+ In summary, PgFincore's target is File System Buffer Cache, Buffer Cache Hibernation's target is DB Buffer Cache(shared buffers). PgFincore is trying to preload database file by posix_fadvise() into File System Buffer Cache, not into DB Buffer Cache(shared buffers). On query execution, buffer manager will get DB buffer blocks by smgrread() from file system unless necessary blocks exist in DB Buffer Cache. At this point, physical reads may not happen because part of (or entire) database file is already loaded into FS Buffer Cache. The gain depends on the file system, especially size of File System Buffer Cache. Preloading database file is equivalent to following command in short. $ cat base/16384/24598 /dev/null I think PgFincore is good for data warehouse in applications. Buffer Cache Hibernation, my approach, is more simple and straight forward. It try to save/load the contents of DB Buffer Cache(shared buffers) using regular files(called Buffer Cache Hibernation Files). At startup, buffer manager will load DB buffer blocks into DB Buffer Cache from Buffer Cache Hibernation Files which was saved at the last shutdown. Note that database file will not be read, so it is not cached in File System Buffer Cache at all. Only contents of DB Buffer Cache are filled. Therefore, the DB buffer cache miss penalty would be larger than PgFincore's. The gain depends on the size of shared buffers, and how often the similar queries are executed before and after restarting. Buffer Cache Hibernation is good for OLTP in applications. I think that PgFincore and Buffer Cache Hibernation is not exclusive, they can co-work together in different caching levels. Sorry for my poor english skill, but I'm doing my best :) Thanks -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2011: Fast GiST index build
gistFindCorrectParent function have branch with following comment: /* * awful!!, we need search tree to find parent ... , but before we * should release all old parent */ Can you provide me an example of case when this branch works? See several lines above: if (parent-blkno == InvalidBlockNumber) /* * end of chain and still didn't found parent, It's very-very * rare situation when root splited */ break; -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
2011/5/5 Mitsuru IWASAKI iwas...@jp.freebsd.org: Hi, I think that PgFincore (http://pgfoundry.org/projects/pgfincore/) provides similar functionality. Are you familiar with that? If so, could you contrast your approach with that one? I'm not familiar with PgFincore at all sorry, but I got source code and documents and read through them just now. # and I'm a novice on postgres actually... The target both is to reduce physical I/O, but their approaches and gains are different. My understanding is like this; +-+ +-+ | Postgres(backend) | | Postgres | | +-+ | | | | | DB Buffer Cache | | | | | | (shared buffers)| | | | | |*my target | | | | | +-+ | | | | ^ ^ | | | | | | | | | | v v | | | | +-+ | | +-+ | | | buffer manager | | | | pgfincore | | | +-+ | | +-+ | +---^--^--+ +--^--+ | |smgrread() |posix_fadvise() |read()| | userland == | | | kernel | +-+-+ | | | v | ++ | | File System | | | +-+ | +--| | FS Buffer Cache | | | |*PgFincore target| | | +-+ | | ^ ^ | +|---|---+ | | == | | hardware +-|---|+ | | v Physical Disk | | | +--+ | | | | base/16384/24598 | | | v +--+ | | +--+ | | |Buffer Cache Hibernation Files| | | +--+ | +--+ littel detail, pgfincore store its data per relation in a file, like you do. I rewrote a bit that, and it will store its data directly in postgresql tables, as well as it will be able to restore the cache from raw bitstring. In summary, PgFincore's target is File System Buffer Cache, Buffer Cache Hibernation's target is DB Buffer Cache(shared buffers). Correct. (btw I am very happy of your idea and that you get time to do it) PgFincore is trying to preload database file by posix_fadvise() into File System Buffer Cache, not into DB Buffer Cache(shared buffers). On query execution, buffer manager will get DB buffer blocks by smgrread() from file system unless necessary blocks exist in DB Buffer Cache. At this point, physical reads may not happen because part of (or entire) database file is already loaded into FS Buffer Cache. The gain depends on the file system, especially size of File System Buffer Cache. Preloading database file is equivalent to following command in short. $ cat base/16384/24598 /dev/null Not exactly. it exists 2 calls : * pgfadv_WILLNEED * pgfadv_WILLNEED_snapshot The former ask to load each segment of a relation *but* the kernel can decide to not do that or load only part of each segment. (so it is not as brutal as cat file /dev/null ) The later read *exactly* each blocks required in each segment, not all blocks except if all were in cache while doing the snapshot. (this one is the part of the snapshot/restore combo) I think PgFincore is good for data warehouse in applications. Pgfincore with bitstring storage in a table allow streaming to HotStandbys and get better response in case of switch-over/fail-over by doing some house-keeping on the HotStandby and keep it really hot ;) Even web applications have large database today (they is more, but it is no the subject) Buffer Cache Hibernation, my approach, is more simple and straight forward. It try to save/load the contents of DB Buffer Cache(shared buffers) using regular files(called Buffer Cache Hibernation Files). At startup, buffer manager will load DB buffer blocks into DB Buffer Cache from Buffer Cache Hibernation Files which was saved at the last shutdown. Note that database file will not be read, so it is not cached in File System Buffer Cache at all. Only contents of DB Buffer Cache are filled. Therefore, the DB buffer cache miss penalty would be larger than PgFincore's. The gain depends on the size of shared buffers, and how often the
Re: [HACKERS] GSoC 2011: Fast GiST index build
2011/5/5 Teodor Sigaev teo...@sigaev.ru See several lines above: if (parent-blkno == InvalidBlockNumber) /* * end of chain and still didn't found parent, It's very-very * rare situation when root splited */ break; As I understood it's because we can't move root to another page. With best regards, Alexander Korotkov.
Re: [HACKERS] VARIANT / ANYTYPE datatype
On Wed, May 4, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@commandprompt.com writes: As a followup idea there exists the desire to store records as records and not text representation of same (given differing record types, of course), for which it'd be more worthwhile. Maybe. The conventional wisdom is that text representation of data is more compact than PG's internal representation by a significant factor --- our FAQ says up to 5x, in fact. I know that that's including row overhead and indexes and so on, but I still don't find it to be a given that you're going to win on space with this sort of trick. I've done a lot of testing of the text vs binary format on the wire format...not exactly the same set of issues, but pretty close since you have to send all the oids, lengths, etc. Conventional wisdom is correct although overstated for this topic. Even in truly pathological cases for text, for example in sending multiple levels of redundant escaping in complex structures, the text format will almost always be smaller. For 'typical' data it can be significantly smaller. Two exceptions most people will run into are bytea obviously and the timestamp family of types where binary style manipulation is a huge win both in terms of space and performance. For complex data (say 3+ levels of composites stacked in arrays), binary type formats are much *faster*, albeit larger, via binary as long as you are not bandwidth constrained, and presumably they would be as well for variants. Perhaps even more so, because some of the manipulations made converting tuple storage to binary wire formats don't have to happen. That said, while there are use cases for sending highly structured data over the wire, I can't think of any for direct storage on a table in variant type scenarios, at least not yet :-). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Wed, May 4, 2011 at 22:42, Simon Riggs si...@2ndquadrant.com wrote: On Wed, May 4, 2011 at 3:47 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jaime Casanova ja...@2ndquadrant.com writes: I want to propose the addition of a new field in IDENTIFY_SYSTEM: xlogversion, which will carry XLOG_PAGE_MAGIC from primary. The idea of sending that info is to allow us to know if the xlog page version of two different major versions are compatible or not. Currently pg_upgrade requires the primary to be taken down, That's *intentional*. The notion of WAL-shipping-replication compatibility between two different major versions is insane on its face. They will not have compatible system catalog contents. You might get perfect replication of the master's catalogs, but the slave wouldn't be able to interpret them. That's exactly how hard in place upgrade was to begin with. Considering how valuable this would be, it seems worth it to pursue this. The reason we have XLOG_PAGE_MAGIC is really more the opposite: to prevent people from trying to recover across a minor version update in which we had to break XLOG compatibility. I don't recall right now if that's ever actually happened, but it definitely could. If that is true, then allowing this patch will allow us to detect that incompatibility when the standby connects to the master, and explain the issue in a useful error message. Otherwise we will just barf on the magic value. Having access to these details might make it possible to upgrade. They could be inferred, but it would be better to have the full data so we can take an informed decision about whether or not it is possible. So even if people don't believe in the rationale behind the patch, would allowing it harm anything at this point? Adding it for the sake of upgrades seems very far fetched. Adding it for the sake of giving a better error message seems like a very good idea. But in that case, the client side code to actually give a better error message should be included from the start, IMHO. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
Magnus Hagander mag...@hagander.net writes: So even if people don't believe in the rationale behind the patch, would allowing it harm anything at this point? Adding it for the sake of upgrades seems very far fetched. Adding it for the sake of giving a better error message seems like a very good idea. But in that case, the client side code to actually give a better error message should be included from the start, IMHO. What's not apparent to me is how we'll even get to this check; if there's a mismatch, won't the database system identifier comparison fail first in most scenarios? I'm also wondering why send WAL version number and not, say, catalog version number, if there's some idea that we need more tests than the system identifier comparison. Given reasonable answers to these questions, I'd not object to putting in additional error testing. I concur with Magnus that the patch should actually provide those tests, and not just put in an unused field. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch to improve style of generate_history.pl perl script
I have applied the attached patch to improve the style of our generate_history.pl perl script. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/generate_history.pl b/doc/src/sgml/generate_history.pl new file mode 100644 index 2b81569..a6c0bd7 *** a/doc/src/sgml/generate_history.pl --- b/doc/src/sgml/generate_history.pl *** *** 13,22 use strict; ! my($srcdir) = shift; ! defined($srcdir) || die $0: missing required argument: srcdir\n; ! my($infile) = shift; ! defined($infile) || die $0: missing required argument: inputfile\n; # Emit DOCTYPE header so that the output is a self-contained SGML document print !DOCTYPE appendix PUBLIC \-//OASIS//DTD DocBook V4.2//EN\\n; --- 13,22 use strict; ! my $srcdir = shift; ! die $0: missing required argument: srcdir\n if !defined($srcdir); ! my $infile = shift; ! die $0: missing required argument: inputfile\n if !defined($infile); # Emit DOCTYPE header so that the output is a self-contained SGML document print !DOCTYPE appendix PUBLIC \-//OASIS//DTD DocBook V4.2//EN\\n; *** process_file($infile); *** 26,36 exit 0; sub process_file { ! my($filename) = @_; local *FILE; # need a local filehandle so we can recurse ! my($f) = $srcdir . '/' . $filename; open(FILE, $f) || die could not read $f: $!\n; while (FILE) { --- 26,36 exit 0; sub process_file { ! my $filename = shift; local *FILE; # need a local filehandle so we can recurse ! my $f = $srcdir . '/' . $filename; open(FILE, $f) || die could not read $f: $!\n; while (FILE) { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha5
Robert Haas wrote: On Mon, Mar 28, 2011 at 9:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Mar 26, 2011 at 9:41 PM, Robert Haas robertmh...@gmail.com wrote: Per previous discussion, I'm going to wrap alpha5 Monday morning Eastern time, barring objections. It seems that the 'make distcheck' build is broken. ?Apparently we don't have any automated testing of this? ?Anyone know what to fix here? Bruce keeps trying to put cross-references where they mustn't go ... Actually those are all my fault. Sorry, I'm still learning the ropes. I didn't realize xref couldn't be used in the release notes; it looks like Bruce used link rather than xref for the things I used xref for. FYI, this is documented in release.sgml in an SGML comment: For new features, add links to the documentation sections. Use /link not just / so that generate_history.pl can remove it, so HISTORY.html can be created without links to the main documentation. Don't use xref. xref can't be used because there is no text to keep, and /link ( not /) has to be used so we remove the right tag. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Visibility map and hint bits
There has been a lot of recent discussion about the visibility map (for index-only scans) and hint bits (trying to avoid double-writing a table). I wonder if we could fix both of these at the same time. Once the visibility map is reliable, can we use that to avoid updating the hint bits on all rows on a page? For bulk loads, all the pages are going have the same xid and all be visible, so instead of writing the entire table, we just write the visibility map. I think the problem is that we have the PD_ALL_VISIBLE page flag, which requires a write of the page as well. Could we get by with only the visibility bits and remove PD_ALL_VISIBLE? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote: I'm far from convinced that storing deltas per column rather than per record is a win anyway. I don't have hard numbers to hand, but my vague recollection is that my tests showed it to be a design that used more space. It depends on how many fields you're changing in one go and how wide the table is. It's also a PITA to identify what fields actually changed if you're storing everything. In the case of logging, I'd say that what's really needed is a way to store a table record that has an indicator of what fields actually changed (and possibly not storing anything for fields that didn't change). That table record would need to also deal with changes to the underlying table structure. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve style of generate_history.pl perl script
On 05/05/2011 12:49 PM, Bruce Momjian wrote: I have applied the attached patch to improve the style of our generate_history.pl perl script. Wow, really? There's nothing wrong with the changes but they seem pretty trivial and pointless to me. Maybe I've drunk the perl TIMTOWTDI koolaid too much. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW table hints
On Tue, May 3, 2011 at 16:19, Tom Lane t...@sss.pgh.pa.us wrote: Dave Page dp...@pgadmin.org writes: On Tue, May 3, 2011 at 10:33 AM, Susanne Ebrecht susa...@2ndquadrant.com wrote: When we make such a hint for foreign tables then we should make a similar hint for views. A view really isn't a table, unlike a foreign table, so I don't think that argument holds. Well, from the implementation standpoint a foreign table is a lot more like a view than it is like a table. I think the real point is that a hint for this on views would be a waste of translator manpower, because we've not heard of anyone making that mistake. The *implementation* is in this case, IMHO; irrelevant. The relevant part is what it looks like to the *user*, and to the user a foreign table looks a lot more like a table than a view does. Since I brought it up - a patch along this line? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index ff84045..e3d902e 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -183,10 +183,18 @@ DefineIndex(RangeVar *heapRelation, /* Note: during bootstrap may see uncataloged relation */ if (rel-rd_rel-relkind != RELKIND_RELATION rel-rd_rel-relkind != RELKIND_UNCATALOGED) + { + if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg(cannot create index on FOREIGN TABLE \%s\, + heapRelation-relname))); + ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg(\%s\ is not a table, heapRelation-relname))); + } /* * Don't try to CREATE INDEX on temp tables of other backends. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VARIANT / ANYTYPE datatype
On 05/05/2011 01:00 PM, Jim Nasby wrote: On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote: I'm far from convinced that storing deltas per column rather than per record is a win anyway. I don't have hard numbers to hand, but my vague recollection is that my tests showed it to be a design that used more space. It depends on how many fields you're changing in one go and how wide the table is. It's also a PITA to identify what fields actually changed if you're storing everything. No it's not. Instead of storing OLD/NEW, store a base record and a delta record (an hstore with just the changed fields) for an update. This saves space and means you only have to calculate what changed once. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW table hints
Magnus Hagander mag...@hagander.net writes: Since I brought it up - a patch along this line? Please don't capitalize foreign table there. Also, I think an else before the other ereport would make the code clearer, even though it's not strictly necessary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW table hints
On Thu, May 5, 2011 at 19:22, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Since I brought it up - a patch along this line? Please don't capitalize foreign table there. Yeah, I was a bit split about that myself. Will change. Also, I think an else before the other ereport would make the code clearer, even though it's not strictly necessary. Agreed. Unless someone objects, I'll go put that in later tonight. (along with a comment on why there are two different msgs) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to improve style of generate_history.pl perl script
Andrew Dunstan wrote: On 05/05/2011 12:49 PM, Bruce Momjian wrote: I have applied the attached patch to improve the style of our generate_history.pl perl script. Wow, really? There's nothing wrong with the changes but they seem pretty trivial and pointless to me. Maybe I've drunk the perl TIMTOWTDI koolaid too much. I wanted it to be consistent with our other Perl script. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: fix race in SSI's CheckTargetForConflictsIn
Dan Ports d...@csail.mit.edu wrote: While running some benchmarks to test SSI performance, I found a race condition that's capable of causing a segfault. A patch is attached. The bug is in CheckTargetForConflictsIn, which scans the list of SIREAD locks on a lock target when it's modified. There's an optimization in there where the writing transaction will remove a SIREAD lock that it holds itself, because it's being replaced with a (stronger) write lock. To do that, it needs to drop its shared lwlocks and reacquire them in exclusive mode. The existing code deals with concurrent modifications in that interval by redoing checks. However, it misses the case where some other transaction removes all remaining locks on the target, and proceeds to remove the lock target itself. The attached patch fixes this by deferring the SIREAD lock removal until the end of the function. At that point, there isn't any need to worry about concurrent updates to the target's lock list. The resulting code is also simpler. I just want to confirm that this addresses a real (although very narrow) race condition. It results from code used to implement a valuable optimization described in Cahill's thesis: don't get or keep an SIREAD lock on a row which has a write lock. The write lock is stronger and will cause a write/write conflict with any overlapping transactions which would care about a read/write conflict. The pattern of reading a row and then updating or deleting it is so common that this optimization does a lot to avoid promotion of predicate locks to coarser granularity, and thereby helps avoid false positives. While the optimization is valuable, the code used to implement it was pretty horrid. (And that was me that wrote it.) It has already been the cause of several other fixes since the main patch went in. What Dan has done here is move the optimization out of the middle of the loop which is doing the conflict detection, and in doing so has reduced the number of lines of code needed, reduced the amount of fiddling with LW locks, and all around made the code more robust and more understandable. I've reviewed the patch and it looks good to me. Dan has beat up on it with the same DBT-2 run which exposed the race condition without seeing a problem. Although a much smaller patch could address the immediate problem, I strongly feel that Dan has taken the right approach by refactoring this bit to something fundamentally cleaner and less fragile. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and hint bits
On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote: There has been a lot of recent discussion about the visibility map (for index-only scans) and hint bits (trying to avoid double-writing a table). I still think a small tqual.c maintained cache of hint bits will effectively eliminate hint bit i/o issues surrounding bulk loads. Tom fired a shot across the bow regarding the general worthiness of that technique though (see: http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) :(. I can rig up a cleaned up version of the patch pretty easily...it's a local change and fairly simple. I don't think there is any way to remove the hint bits without suffering some other problem. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and hint bits
Merlin Moncure wrote: On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote: There has been a lot of recent discussion about the visibility map (for index-only scans) and hint bits (trying to avoid double-writing a table). I still think a small tqual.c maintained cache of hint bits will effectively eliminate hint bit i/o issues surrounding bulk loads. Tom fired a shot across the bow regarding the general worthiness of that technique though (see: http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) :(. I can rig up a cleaned up version of the patch pretty easily...it's a local change and fairly simple. I don't think there is any way to remove the hint bits without suffering some other problem. Was that the idea that the pages had to fit in the cache and be updated with hint bits before being written to disk? Restricting that to the size of the buffer cache seemed very limiting. One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I thought that would be a better all-visible indictor and avoid many all-visible page writes in bulk load cases. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Comments on system tables and columns
Alvaro Herrera wrote: Excerpts from Thom Brown's message of lun mar 28 08:14:07 -0300 2011: Hi, I notice that none of the system tables or columns thereof bear any comments. Is this intentional, or an oversight? I would have thought comments would be useful since the column names aren't exactly always self-explanatory. Bruce has been working on changes to have catalog objects (tables, views and columns) contain comments, but he deferred it to 9.2 because it involved nontrivial pieces of infrastructure (mainly to avoid duplication with the SGML catalog documentation). Attached are diffs that change the Makefile and initdb, and a perl script to pull the system view comments out of the SGML docs. I need to do more work to pull stuff for the system tables. This does work in testing. I will work on this more for 9.2. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile new file mode 100644 index 3a83461..68ed5fe *** a/src/backend/catalog/Makefile --- b/src/backend/catalog/Makefile *** OBJS = catalog.o dependency.o heap.o ind *** 16,22 pg_operator.o pg_proc.o pg_db_role_setting.o pg_shdepend.o pg_type.o \ storage.o toasting.o ! BKIFILES = postgres.bki postgres.description postgres.shdescription include $(top_srcdir)/src/backend/common.mk --- 16,22 pg_operator.o pg_proc.o pg_db_role_setting.o pg_shdepend.o pg_type.o \ storage.o toasting.o ! BKIFILES = postgres.bki postgres.description postgres.shdescription system_view_comments.sql include $(top_srcdir)/src/backend/common.mk *** schemapg.h: postgres.bki ; *** 59,70 --- 59,74 postgres.bki: genbki.pl Catalog.pm $(POSTGRES_BKI_SRCS) $(PERL) -I $(catalogdir) $ $(pg_includes) --set-version=$(MAJORVERSION) $(POSTGRES_BKI_SRCS) + system_view_comments.sql: $(top_builddir)/doc/src/sgml/catalogs.sgml gen_comments.pl + $(PERL) $(srcdir)/gen_comments.pl $ $@ + .PHONY: install-data install-data: $(BKIFILES) installdirs $(INSTALL_DATA) $(call vpathsearch,postgres.bki) '$(DESTDIR)$(datadir)/postgres.bki' $(INSTALL_DATA) $(call vpathsearch,postgres.description) '$(DESTDIR)$(datadir)/postgres.description' $(INSTALL_DATA) $(call vpathsearch,postgres.shdescription) '$(DESTDIR)$(datadir)/postgres.shdescription' $(INSTALL_DATA) $(srcdir)/system_views.sql '$(DESTDIR)$(datadir)/system_views.sql' + $(INSTALL_DATA) $(srcdir)/system_view_comments.sql '$(DESTDIR)$(datadir)/system_view_comments.sql' $(INSTALL_DATA) $(srcdir)/information_schema.sql '$(DESTDIR)$(datadir)/information_schema.sql' $(INSTALL_DATA) $(srcdir)/sql_features.txt '$(DESTDIR)$(datadir)/sql_features.txt' *** installdirs: *** 73,79 .PHONY: uninstall-data uninstall-data: ! rm -f $(addprefix '$(DESTDIR)$(datadir)'/, $(BKIFILES) system_views.sql information_schema.sql sql_features.txt) # postgres.bki, postgres.description, postgres.shdescription, and schemapg.h # are in the distribution tarball, so they are not cleaned here. --- 77,83 .PHONY: uninstall-data uninstall-data: ! rm -f $(addprefix '$(DESTDIR)$(datadir)'/, $(BKIFILES) system_views.sql system_view_comments.sql information_schema.sql sql_features.txt) # postgres.bki, postgres.description, postgres.shdescription, and schemapg.h # are in the distribution tarball, so they are not cleaned here. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c new file mode 100644 index acd2514..f0d72e9 *** a/src/bin/initdb/initdb.c --- b/src/bin/initdb/initdb.c *** static char *dictionary_file; *** 102,107 --- 102,108 static char *info_schema_file; static char *features_file; static char *system_views_file; + static char *system_view_comments_file; static bool made_new_pgdata = false; static bool found_existing_pgdata = false; static bool made_new_xlogdir = false; *** static void setup_auth(void); *** 166,171 --- 167,173 static void get_set_pwd(void); static void setup_depend(void); static void setup_sysviews(void); + static void append_sysviews(char **lines); static void setup_description(void); static void setup_collation(void); static void setup_conversion(void); *** setup_depend(void) *** 1419,1432 static void setup_sysviews(void) { ! PG_CMD_DECL; ! char **line; ! char **sysviews_setup; fputs(_(creating system views ... ), stdout); fflush(stdout); ! sysviews_setup = readfile(system_views_file); /* * We use -j here to avoid backslashing stuff in system_views.sql --- 1421,1452 static void setup_sysviews(void) { ! char **sysview_lines; ! char **sysview_comment_lines; fputs(_(creating system views ... ), stdout);
Re: [HACKERS] Visibility map and hint bits
On Thu, May 5, 2011 at 12:59 PM, Bruce Momjian br...@momjian.us wrote: I wonder if we could fix both of these at the same time. Once the visibility map is reliable, can we use that to avoid updating the hint bits on all rows on a page? I don't think so. There are two problems: 1. If there is a long-running transaction on the system, it will not be possible to set PD_ALL_VISIBLE, but hint bits can still be set. So there could be a significant performance regression if we don't set hint bits in that case. 2. Making the visibility map crash-safe will mean making setting hint bits emit XLOG records, so it can't be done on Hot Standby servers at all, and it's much more expensive than just setting a hint bit on the master. For bulk loads, all the pages are going have the same xid and all be visible, so instead of writing the entire table, we just write the visibility map. I think the problem is that we have the PD_ALL_VISIBLE page flag, which requires a write of the page as well. Could we get by with only the visibility bits and remove PD_ALL_VISIBLE? In some ways, that would make things much simpler. But to make that work, every insert/update/delete to a page would have to pin the visibility map page and clear PD_ALL_VISIBLE if appropriate, so it might not be good from a performance standpoint, especially in high-concurrency workloads. Right now, if PD_ALL_VISIBLE isn't set, we don't bother touching the visibility map page, which seems like a possibly important optimization. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and hint bits
On Thu, May 5, 2011 at 1:34 PM, Bruce Momjian br...@momjian.us wrote: Merlin Moncure wrote: On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote: There has been a lot of recent discussion about the visibility map (for index-only scans) and hint bits (trying to avoid double-writing a table). I still think a small tqual.c maintained cache of hint bits will effectively eliminate hint bit i/o issues surrounding bulk loads. Tom fired a shot across the bow regarding the general worthiness of that technique though (see: http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html) :(. I can rig up a cleaned up version of the patch pretty easily...it's a local change and fairly simple. I don't think there is any way to remove the hint bits without suffering some other problem. Was that the idea that the pages had to fit in the cache and be updated with hint bits before being written to disk? Restricting that to the size of the buffer cache seemed very limiting. One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I thought that would be a better all-visible indictor and avoid many all-visible page writes in bulk load cases. no, that was my first idea -- check visibility when you evict. that helps a different problem but not bulk loads. One way it could help is for marking PD_ALL_VISIBLE. This might also be a winner but there is some valid skepticism that adding more work for bgwriter is really a good idea. The tqual cache idea is such that there is a small cache that remembers the commit/cancel status of recently seen transactions. If scan a tuple and the status is known via cache, you set the bit but don't mark the page dirty. That way, if you are scanning a lot of unhinted tuples with similar xid, you don't need to jam out i/o. I think the general concept is clean, but it might need some buy in from tom and some performance testing for justification. The alternate 'cleaner' approach of maintaining larger transam.c cache had some downsides I saw no simple workaround for. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and hint bits
Merlin Moncure mmonc...@gmail.com wrote: a small cache that remembers the commit/cancel status of recently seen transactions. How is that different from the head of the clog SLRU? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new clang report
On mån, 2011-05-02 at 14:45 -0400, Tom Lane wrote: So issue here is actually that clang has an option -fmath-errno Indicate that math functions should be treated as updating errno. Really? It looks to me like the issue is that pow() is returning NaN instead of Inf for an out-of-range result. That's a bug: the correct result is *not* ill-defined, it's simply too large to represent. If that has anything to do with errno, it's an implementation artifact that's unrelated to the claimed meaning of the switch. But I would also note that the Single Unix Spec is unequivocal about this case: If the correct value would cause overflow, +-HUGE_VAL is returned, and errno is set to [ERANGE]. That's IS set, not may be set as in some other cases. So this behavior should not depend on any such compiler switch anyway, unless the intent of the switch is ignore the standard and do whatever we feel like. Well, the intent of the switch actually appears to be rather follow the standard and do what it says with the default behavior being the questionable one. So we could easily settle on that you need to use that switch, otherwise it's not supported. (This is actually quite similar to the old days when some systems had -ffast-math the default.) Btw., when you build a simple test program in the default mode, pow() indeed returns Inf on overflow. There appear to be some code generation or optimization problems when it builds the postgres code, because the problem goes away with either -O0 or by inserting an elog or something like that after the pow() call. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Visibility map and hint bits
On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Merlin Moncure mmonc...@gmail.com wrote: a small cache that remembers the commit/cancel status of recently seen transactions. How is that different from the head of the clog SLRU? several things: *) any slru access requires lock (besides the lock itself, you are spending cycles in critical path) *) cache access happens at different stage of processing in HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and TransactionIdIsInProgress have to be checked first. Logically, it's extension of hint bit check itself, not expansion of lower levels of caching *) in tqual.c you can sneak in some small optimizations like only caching the bit if it's known good in the WAL (XlogNeedsFlush). That way you don't need to keep checking it over and over for the same trasaction *) slru level accesses happen too late to give much benefit: I can't stress enough how tight HeapTupleSatisfiesMVCC is. On my workstation VM, each non inline function call shows up measurably in profiling. I think anything you do here has to be inline, hand rolled, and very tight (you can forget anything around dynahash). Delegating the cache management to transam or (even worse) slru level penalizes some workloads non-trivially. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW table hints
On Thu, May 5, 2011 at 19:26, Magnus Hagander mag...@hagander.net wrote: On Thu, May 5, 2011 at 19:22, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Since I brought it up - a patch along this line? Please don't capitalize foreign table there. Yeah, I was a bit split about that myself. Will change. Also, I think an else before the other ereport would make the code clearer, even though it's not strictly necessary. Agreed. Unless someone objects, I'll go put that in later tonight. (along with a comment on why there are two different msgs) Done. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Process wakeups when idle and power consumption
There is a general need to have Postgres consume fewer CPU cycles and less power when idle. Until something is done about this, shared hosting providers, particularly those who want to deploy many VM instances with databases, will continue to choose MySQL out of hand. I have quantified the difference in the number of wake-ups when idle between Postgres and MySQL using Intel's powertop utility on my laptop, which runs Fedora 14. These figures are for a freshly initdb'd database from git master, and mysql-server 5.1.56 from my system's package manager. *snip* 2.7% ( 11.5) [ ] postgres 1.1% ( 4.6) [ 1663] Xorg 0.9% ( 3.7) [ 1463] wpa_supplicant 0.6% ( 2.7) [ ] [ahci] interrupt 0.5% ( 2.2) [ ] mysqld *snip* Postgres consistenly has 11.5 wakeups per second, while MySQL consistently has 2.2 wakeups (averaged over the 5 second period that each cycle of instrumentation lasts). If I turn on archiving, the figure for Postgres naturally increases: *snip* 1.7% ( 12.5) [ ] postgres 1.6% ( 12.0) [ 808] phy0 0.7% ( 5.4) [ 1463] wpa_supplicant 0.6% ( 4.3) [ ] [ahci] interrupt 0.3% ( 2.2) [ ] mysqld *snip* It increases by exactly the amount that you'd expect after looking at pgarch.c - one wakeup per second. This is because there is a loop within the main event loop for the process that is a prime example of what unix_latch.c describes as the common pattern of using pg_usleep() or select() to wait until a signal arrives, where the signal handler sets a global variable. The loop naps for one second per iteration. Attached is the first in what I hope will become a series of patches for reducing power consumption when idle. It makes the archiver process wake far less frequently, using a latch primitive, specifically a non-shared latch. I'm not sure if I should have used a shared latch, and have SetLatch() calls replace SendPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) calls. Would that have broken some implied notion of encapsulation? In any case, if I apply the patch and rebuild, the difference is quite apparent: ***snip*** 3.9% ( 21.8) [ 1663] Xorg 3.2% ( 17.9) [ ] [ath9k] interrupt 2.1% ( 11.9) [ 808] phy0 2.1% ( 11.5) [ ] postgres 1.0% ( 5.4) [ 1463] wpa_supplicant 0.4% ( 2.2) [ ] mysqld ***snip*** The difference from not running the archiver at all appears to have been completely eliminated (in fact, we still wake up every PGARCH_AUTOWAKE_INTERVAL seconds, which is 60 seconds, but that usually isn't apparent to powertop, which measures wakeups over 5 second periods). If we could gain similar decreases in idle power consumption across all Postgres ancillary processes, perhaps we'd see Postgres available as an option for shared hosting plans more frequently. When these differences are multiplied by thousands of VM instances, they really matter. Unfortunately, there doesn't seem to be a way to get powertop to display its instrumentation per-process to quickly get a detailed overview of where those wake-ups occur across all pg processes. I hope to work on reducing wakeups for PG ancillary processes in this order (order of perceived difficulty), using shared latches to eliminate the waiting pattern in each case: * WALWriter * BgWriter * WALReceiver * Startup process I'll need to take a look at statistics, autovacuum and Logger processes too, to see if they present more subtle opportunities for reduced idle power consumption. Do constants like PGARCH_AUTOWAKE_INTERVAL need to always be set at their current, conservative levels? Perhaps these sorts of values could be collectively controlled with a single GUC that represents a trade-off between CPU cycles used when idle against safety/reliability. On the other hand, there are GUCs that control that per process in some cases already, such as wal_writer_delay, and that suggestion could well be a bit woolly. It might be an enum value that represented various levels of concern that would default to something like 'conservative' (i.e. the current values). Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index b40375a..01e5350 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -44,6 +44,7 @@ #include storage/pmsignal.h #include utils/guc.h #include utils/ps_status.h +#include storage/latch.h /* -- @@ -87,6 +88,12 @@ static volatile sig_atomic_t got_SIGTERM = false; static volatile sig_atomic_t wakened = false; static volatile sig_atomic_t ready_to_stop = false; +/* + * Latch that archiver loop waits on until it is awakened by + * signals, each of which there is a handler for + */ +static volatile Latch mainloop_latch; + /* -- * Local function forward declarations * -- @@ -228,6 +235,8 @@ PgArchiverMain(int
Re: [HACKERS] new clang report
Peter Eisentraut pete...@gmx.net writes: Btw., when you build a simple test program in the default mode, pow() indeed returns Inf on overflow. There appear to be some code generation or optimization problems when it builds the postgres code, because the problem goes away with either -O0 or by inserting an elog or something like that after the pow() call. Hmm. Sounds to me like clang is trying to insert an inlined version of pow() that gets this case wrong. Any of -fmath-errno, -O0, or possibly other things discourage it from doing that, and then the non-inline code gets it right. Bug for sure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
Peter Geoghegan pe...@2ndquadrant.com writes: Attached is the first in what I hope will become a series of patches for reducing power consumption when idle. Cool. This has been on my personal to-do list for awhile, but it keeps on failing to get to the top, so I'm glad to see somebody else putting time into it. The major problem I'm aware of for getting rid of periodic wakeups is the need for child processes to notice when the postmaster has died unexpectedly. Your patch appears to degrade the archiver's response time for that really significantly, like from O(1 sec) to O(1 min), which I don't think is acceptable. We've occasionally kicked around ideas for mechanisms that would solve this problem, but nothing's gotten done. It doesn't seem to be an easy problem to solve portably... + * The caveat about signals invalidating the timeout of + * WaitLatch() on some platforms can be safely disregarded, Really? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011: I'll need to take a look at statistics, autovacuum and Logger processes too, to see if they present more subtle opportunities for reduced idle power consumption. More subtle? Autovacuum wakes up once per second and it could sleep a lot longer if it weren't for the loop that checks for signals. I think that could be improved a lot. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On May 5, 2011, at 4:08 PM, Alvaro Herrera wrote: Excerpts from Peter Geoghegan's message of jue may 05 16:49:25 -0300 2011: I'll need to take a look at statistics, autovacuum and Logger processes too, to see if they present more subtle opportunities for reduced idle power consumption. More subtle? Autovacuum wakes up once per second and it could sleep a lot longer if it weren't for the loop that checks for signals. I think that could be improved a lot. Could kqueue be of use here? Non-kqueue-supporting platforms could always fall back to the existing select(). Cheers, M -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On Thu, May 5, 2011 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: + * The caveat about signals invalidating the timeout of + * WaitLatch() on some platforms can be safely disregarded, Really? I'm a bit confused by the phrasing of this comment as well, but it does seem to me that if all the relevant signal handlers set the latch, then it ought not to be necessary to break the sleep down into one-second intervals. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [BUGS] BUG #5957: createdb with description and md5 auth forces to provide password twice
Grzegorz Szpetkowski wrote: The following bug has been logged online: Bug reference: 5957 Logged by: Grzegorz Szpetkowski Email address: gszpetkow...@gmail.com PostgreSQL version: 9.0.3 Operating system: Ubuntu 10.10 Description:createdb with description and md5 auth forces to provide password twice Details: How to reproduce the problem: 1.Create new role with (encrypted password): createuser -SdRP user 2.In PostgreSQL 9.0.3 I found pg_hba.conf with local all all ident, so change to local all all md5 3.Restart/Reload used cluster 4.Execute createdb -U user mydb My DB Description Output: Password: Password: creatdb command prompts password twice and I think it's improper behaviour (and documentation is silent about that). Interesting. This is happening because we are connecting to one database to create the new database, and then connecting to the new database to add the comment. Prior to PG 8.2, this was necessary to put the comment on the database, but now that we have the shared comment/description table pg_shdescription, this is not necessary. Do we need createdb to be able to create databases for pre-8.2 clusters? If not, the attached patch fixes the double-prompting. Also, why is this code used to create the new database? conn = connectDatabase(strcmp(dbname, postgres) == 0 ? template1 : postgres, host, port, username, prompt_password, progname); Do we assume more users can connect to the 'postgres' database, but we want 'postgres' to connect to 'template1' in case it wants to drop the 'postgres' database? Whatever the purpose, this code certainly needs a comment. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c new file mode 100644 index 9b72eac..b1c417b *** a/src/bin/scripts/createdb.c --- b/src/bin/scripts/createdb.c *** main(int argc, char *argv[]) *** 208,219 } PQclear(result); - PQfinish(conn); if (comment) { - conn = connectDatabase(dbname, host, port, username, prompt_password, progname); - printfPQExpBuffer(sql, COMMENT ON DATABASE %s IS , fmtId(dbname)); appendStringLiteralConn(sql, comment, conn); appendPQExpBuffer(sql, ;\n); --- 208,216 *** main(int argc, char *argv[]) *** 231,239 } PQclear(result); - PQfinish(conn); } exit(0); } --- 228,237 } PQclear(result); } + PQfinish(conn); + exit(0); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
Robert Haas robertmh...@gmail.com writes: On Thu, May 5, 2011 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: + * The caveat about signals invalidating the timeout of + * WaitLatch() on some platforms can be safely disregarded, Really? I'm a bit confused by the phrasing of this comment as well, but it does seem to me that if all the relevant signal handlers set the latch, then it ought not to be necessary to break the sleep down into one-second intervals. [ reads code some more ... ] Yeah, I think you are probably right, which makes it just a badly phrased comment. The important point here is that the self-pipe trick in unix_latch.c fixes the problem, so long as we are relying on latch release and NOT timeout-driven wakeup. What that really means is that any WaitOnLatch call with a finite timeout ought to be viewed with a jaundiced eye. Ideally, we want them all to be waiting for latch release and nothing else. I'm concerned that we're going to be moving towards some intermediate state where we have WaitOnLatch calls with very long timeouts, because the longer the timeout, the worse the problem gets on platforms that have the problem. If you have say a 1-minute timeout, it's not difficult to believe that you'll basically never wake up because of random signals resetting the timeout. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #5957: createdb with description and md5 auth forces to provide password twice
Bruce Momjian br...@momjian.us writes: Prior to PG 8.2, this was necessary to put the comment on the database, but now that we have the shared comment/description table pg_shdescription, this is not necessary. Do we need createdb to be able to create databases for pre-8.2 clusters? If not, the attached patch fixes the double-prompting. Well, if you're only going to change this in HEAD, that might be an acceptable limitation, but if you intend to back-patch I think not. Older versions of createdb are probably significantly more likely to be used with even-older servers. Seems like it wouldn't be that hard to test the server version and only reconnect if it's pre-8.2. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [BUGS] BUG #5957: createdb with description and md5 auth forces to provide password twice
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Prior to PG 8.2, this was necessary to put the comment on the database, but now that we have the shared comment/description table pg_shdescription, this is not necessary. Do we need createdb to be able to create databases for pre-8.2 clusters? If not, the attached patch fixes the double-prompting. Well, if you're only going to change this in HEAD, that might be an acceptable limitation, but if you intend to back-patch I think not. Older versions of createdb are probably significantly more likely to be used with even-older servers. This code has been that way since pre-8.2 so I see no need to backpatch; this is the first such complaint I have seen. Seems like it wouldn't be that hard to test the server version and only reconnect if it's pre-8.2. I am not excited about adding more code for this so I am thinking head-only. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
Andres Freund and...@anarazel.de writes: In my opinion this is actually a bug in 9.0. As its a (imo) low impact fix thats constrained to two files it seems sensible to backpatch it now that the solution has proven itself in the field? FWIW, I still don't trust that patch a lot (and I was the one who wrote it). The question is whether you believe that every platform uses Unicode code points directly as the wchar_t representation in UTF8-based locales. Although we've not heard any complaints suggesting that that assumption doesn't hold, I don't feel that 9.0 has been out long enough to prove much. The complaints about the old code were rare enough to make me think people just don't exercise the case too much, or don't notice if the behavior is wrong. So it likely hasn't had that much exercise in 9.0 either. In short, I'm pretty wary of back-patching it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
On 5 May 2011 22:22, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, May 5, 2011 at 4:05 PM, Tom Lane t...@sss.pgh.pa.us wrote: + * The caveat about signals invalidating the timeout of + * WaitLatch() on some platforms can be safely disregarded, Really? I'm a bit confused by the phrasing of this comment as well, but it does seem to me that if all the relevant signal handlers set the latch, then it ought not to be necessary to break the sleep down into one-second intervals. [ reads code some more ... ] Yeah, I think you are probably right, which makes it just a badly phrased comment. The important point here is that the self-pipe trick in unix_latch.c fixes the problem, so long as we are relying on latch release and NOT timeout-driven wakeup. Why do you think that my comment is badly phrased? What that really means is that any WaitOnLatch call with a finite timeout ought to be viewed with a jaundiced eye. Ideally, we want them all to be waiting for latch release and nothing else. I'm concerned that we're going to be moving towards some intermediate state where we have WaitOnLatch calls with very long timeouts, because the longer the timeout, the worse the problem gets on platforms that have the problem. If you have say a 1-minute timeout, it's not difficult to believe that you'll basically never wake up because of random signals resetting the timeout. Unless all signal handlers for signals that we expect call SetLatch() anyway, as in this case. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some surprising precedence behavior in PG's grammar
I wrote: Andrew Dunstan and...@dunslane.net writes: If we do need a precedence setting for NULL_P, then I think it should probably be on its own and not sharing one with IS. Yeah, I was thinking that too. If we put %prec on the IS [NOT] NULL productions then there is no need for NULL_P to have exactly its current precedence; anything above POSTFIXOP would preserve the current behavior in the DEFAULT ... NULL case. (And if we decided we wanted to flip that behavior, anything below POSTFIXOP would do that.) On reflection I decided that the best quick-fix is to put NULL into the list of keywords that are already precedence-grouped with IDENT. That at least makes sure that it has precedence behavior equivalent to any plain old non-keyword. If you can find a better fix, maybe we could apply it to the other cases mentioned there as well. BTW, I wonder why NOTNULL and ISNULL have their own precedence levels, rather than being made to act exactly like IS [NOT] NULL ... Is anybody up for changing that, or should we leave well enough alone? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Process wakeups when idle and power consumption
Peter Geoghegan pe...@2ndquadrant.com writes: On 5 May 2011 22:22, Tom Lane t...@sss.pgh.pa.us wrote: What that really means is that any WaitOnLatch call with a finite timeout ought to be viewed with a jaundiced eye. Ideally, we want them all to be waiting for latch release and nothing else. I'm concerned that we're going to be moving towards some intermediate state where we have WaitOnLatch calls with very long timeouts, because the longer the timeout, the worse the problem gets on platforms that have the problem. If you have say a 1-minute timeout, it's not difficult to believe that you'll basically never wake up because of random signals resetting the timeout. Unless all signal handlers for signals that we expect call SetLatch() anyway, as in this case. It's signals that we don't expect that I'm a bit worried about here. In any case, the bottom line is that having a timeout on WaitOnLatch is a kludge, and we should try to avoid it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching
On Thu, May 5, 2011 at 5:21 AM, Andres Freund and...@anarazel.de wrote: In my opinion this is actually a bug in 9.0. As its a (imo) low impact fix thats constrained to two files it seems sensible to backpatch it now that the solution has proven itself in the field? The issue is hard to find and has come up several times in the field. And it has been slightly embarassing more than once ;) Can you share some more details about your experiences? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why is RegisterPredicateLockingXid called while holding XidGenLock?
Inquiring minds want to know ... This seems like a pretty lousy place to do it, first because of the contention hit from holding that high-traffic lock any longer than necessary, and second because every added chance for error between ExtendCLOG() and TransactionIdAdvance(ShmemVariableCache-nextXid) gives us another way to fail in the way recently mentioned by Joe Conway: http://archives.postgresql.org/message-id/4dbe4e7d.80...@joeconway.com Even if it's actually necessary to set up that data structure while holding XidGenLock, I would *really* like the call to not be exactly where it is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] clog_redo causing very long recovery time
Joseph Conway m...@joeconway.com writes: I'm working with a client that uses Postgres on what amounts to an appliance. The database is therefore subject to occasional torture such as, in this particular case, running out of disk space while performing a million plus queries (of mixed varieties, many using plpgsql with exception handling -- more on that later), and eventually being power-cycled. Upon restart, clog_redo was called approx 885000 times (CLOG_ZEROPAGE) during recovery, which took almost 2 hours on their hardware. I should note that this is on Postgres 8.3.x. After studying the source, I can only see one possible way that this could have occurred: In varsup.c:GetNewTracsactionId(), ExtendCLOG() needs to succeed on a freshly zeroed clog page, and ExtendSUBTRANS() has to fail. Both of these calls can lead to a page being pushed out of shared buffers and to disk, so given a lack of disk space, sufficient clog buffers, but lack of subtrans buffers, this could happen. At that point the transaction id does not get advanced, so clog zeros the same page, extendSUBTRANS() fails again, rinse and repeat. I believe in the case above, subtrans buffers were exhausted due to the extensive use of plpgsql with exception handling. Hmm, interesting. I believe that it's not really a question of buffer space or lack of it, but whether the OS will give us disk space when we try to add a page to the current pg_subtrans file. In any case, the point is that a failure between ExtendCLOG and incrementing nextXid is bad news. The attached fix-clogredo diff is my proposal for a fix for this. That seems pretty grotty :-( I think a more elegant fix might be to just swap the order of the ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId. The reason that would help is that pg_subtrans isn't WAL-logged, so if we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we won't have written any XLOG entry, and thus repeated failures will not result in repeated XLOG entries. I seem to recall having considered exactly that point when the clog WAL support was first done, but the scenario evidently wasn't considered when subtransactions were stuck in :-(. It would probably also help to put in a comment admonishing people to not add stuff right there. I see the SSI guys have fallen into the same trap. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] clog_redo causing very long recovery time
Excerpts from Tom Lane's message of vie may 06 00:22:43 -0300 2011: I think a more elegant fix might be to just swap the order of the ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId. The reason that would help is that pg_subtrans isn't WAL-logged, so if we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we won't have written any XLOG entry, and thus repeated failures will not result in repeated XLOG entries. I seem to recall having considered exactly that point when the clog WAL support was first done, but the scenario evidently wasn't considered when subtransactions were stuck in :-(. I'm pretty sure it would have never occured to me to consider such a scenario. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] clog_redo causing very long recovery time
On 05/05/2011 08:22 PM, Tom Lane wrote: Joseph Conway m...@joeconway.com writes: The attached fix-clogredo diff is my proposal for a fix for this. That seems pretty grotty :-( I think a more elegant fix might be to just swap the order of the ExtendCLOG and ExtendSUBTRANS calls in GetNewTransactionId. The reason that would help is that pg_subtrans isn't WAL-logged, so if we succeed doing ExtendSUBTRANS and then fail in ExtendCLOG, we won't have written any XLOG entry, and thus repeated failures will not result in repeated XLOG entries. I seem to recall having considered exactly that point when the clog WAL support was first done, but the scenario evidently wasn't considered when subtransactions were stuck in :-(. Yes, that does seem much nicer :-) It would probably also help to put in a comment admonishing people to not add stuff right there. I see the SSI guys have fallen into the same trap. Right -- I think another similar problem exists in GetNewMultiXactId where ExtendMultiXactOffset could succeed and write an XLOG entry and then ExtendMultiXactMember could fail before advancing nextMXact. The problem in this case is that they both write XLOG entries, so a simple reversal doesn't help. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support signature.asc Description: OpenPGP digital signature
Re: [HACKERS] clog_redo causing very long recovery time
Joe Conway m...@joeconway.com writes: Right -- I think another similar problem exists in GetNewMultiXactId where ExtendMultiXactOffset could succeed and write an XLOG entry and then ExtendMultiXactMember could fail before advancing nextMXact. The problem in this case is that they both write XLOG entries, so a simple reversal doesn't help. Hmm. Maybe we need a real fix then. I was just sitting here speculating about whether we'd ever decide we need to WAL-log pg_subtrans --- because if we did, my solution would fail. I still think that the right fix is to avoid emitting redundant XLOG records in the first place, rather than hacking recovery to not process them. Possibly we could modify slru.c so that it could be determined whether zeroing of the current page had already happened. In a quick look, it looks like noting whether latest_page_number had already been advanced to that page might do the trick. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Why is RegisterPredicateLockingXid called while holding XidGenLock?
On Thu, May 05, 2011 at 11:12:40PM -0400, Tom Lane wrote: Even if it's actually necessary to set up that data structure while holding XidGenLock, I would *really* like the call to not be exactly where it is. Good question. I don't believe it needs to be while XidGenLock is being held at all; certainly not in this particular place. All that really matters is that it happens before any backend starts seeing said xid in tuple headers. I believe the call can be moved over to AssignTransactionId. I'd probably put it at the end of that function, but it can go anywhere between there and where it is now. Do you have any preference? Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Why is RegisterPredicateLockingXid called while holding XidGenLock?
Tom Lane wrote: This seems like a pretty lousy place to do it, first because of the contention hit from holding that high-traffic lock any longer than necessary, and second because every added chance for error between ExtendCLOG() and TransactionIdAdvance(ShmemVariableCache-nextXid) gives us another way to fail in the way recently mentioned by Joe Conway: http://archives.postgresql.org/message-id/4dbe4e7d.80...@joeconway.com Even if it's actually necessary to set up that data structure while holding XidGenLock, I would *really* like the call to not be exactly where it is. On a quick scan, I can't see any reason this couldn't be moved down to just above the return. I think the reason I dropped it where I did was simply that it was where we seemed to be letting other parts of the system know about the new xid, so I was going for consistency. I want to double-check this when I'm a little more awake, but my I don't think that anything will be doing anything in the predicate locking area where xid would matter until after the return from this function. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why is RegisterPredicateLockingXid called while holding XidGenLock?
Dan Ports d...@csail.mit.edu writes: On Thu, May 05, 2011 at 11:12:40PM -0400, Tom Lane wrote: Even if it's actually necessary to set up that data structure while holding XidGenLock, I would *really* like the call to not be exactly where it is. Good question. I don't believe it needs to be while XidGenLock is being held at all; certainly not in this particular place. All that really matters is that it happens before any backend starts seeing said xid in tuple headers. I believe the call can be moved over to AssignTransactionId. I'd probably put it at the end of that function, but it can go anywhere between there and where it is now. Do you have any preference? Yeah, I was thinking that it'd be better to pull it out of GetNewTransactionId and put it in a higher level. No strong preference about where in AssignTransactionId to put it. Is there any chance that it would be significant whether we do it before or after taking the lock on the XID (XactLockTableInsert)? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Why is RegisterPredicateLockingXid called while holding XidGenLock?
Tom Lane wrote: Yeah, I was thinking that it'd be better to pull it out of GetNewTransactionId and put it in a higher level. As long as it is always called when an xid is assigned. Since this function appears to be on the only path to that, it should be fine. No strong preference about where in AssignTransactionId to put it. Is there any chance that it would be significant whether we do it before or after taking the lock on the XID (XactLockTableInsert)? No, but since we need to do it only on a top level assignment, we could save a couple cycles by putting it on an else on line 456. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Debug contrib/cube code
Hi, I am using postgresql-8.4.6. I want to debug the contrib/cube code. Can we able to debug that cube code? Because there is no .configure file to enable debug. Is there is any way to change make file to enable debug? Thanks