Re: [HACKERS] Typed hstore proposal
Christopher Browne cbbro...@gmail.com writes: On Wed, Dec 21, 2011 at 8:32 PM, Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com wrote: I mean to create a typed hstore, called tstore for now. I'm open to name suggestions. It'll only support a subset of core Postgres types to begin with. Keys are always text, it's the value that's typed. JSON is the thing of the day that it would be desirable for this to be potent enough to represent, and JSON has the following types: 1. Number (in practice, FLOAT) 2. String (UTF-8) 3. Boolean (t/f) 4. Array (not necessarily of uniform type 5. Object (string key, JSON value pairs, unordered) 6. NULL #4 and #5 are obviously entirely more hairy. Not so much if I extend tstore to include itself. Unless I'm mistaking the Object type. I'm not a user of JSON myself. How are people likely to use it with it? -- Johann Oskarssonhttp://www.2ndquadrant.com/|[] PostgreSQL Development, 24x7 Support, Training and Services --+-- | Blog: http://my.opera.com/myrkraverk/blog/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication timeout units
Magnus Hagander mag...@hagander.net writes: from postgresql.conf.sample: #replication_timeout = 60s# in milliseconds; 0 disables Seconds or milliseconds? I would suggest we just remove the in milliseconds, and instead say timeout for replication connections; 0 disables. That would take away any information about whether it could be set at sub-second resolution. I don't like the current wording either, but just removing the in milliseconds seems wrong. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cursor behavior
It seems that the task of fetching next n results without moving the cursor seems like too complicated to implement for any query that has even a little bit of complication in it... --- On Wed, 12/21/11, Robert Haas robertmh...@gmail.com wrote: From: Robert Haas robertmh...@gmail.com Subject: Re: [HACKERS] Cursor behavior To: amit sehas cu...@yahoo.com Cc: pgsql-hackers@postgresql.org Date: Wednesday, December 21, 2011, 8:43 AM On Thu, Dec 15, 2011 at 4:15 PM, amit sehas cu...@yahoo.com wrote: I had a question about the cursor internals implementation. When you Fetch next 'n' results without moving the cursors, is this kind of functionality implemented by firstly executing the whole query and then moving the cursor over the results, or are the movements done on active database lookups, moving forward and backward... I think it depends on the query. For example, I believe that a query involving writeable CTEs will be run to completion before returning any results, but I believe that a seqscan will not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Let me mention another lightweight data-interchange format. At http://www.janestreet.com we have developed a small c module to deal with S-expressions (sexp) as a way to store arbitrary data. As we write most of our code in OCaml sexps are a natural way for us to store data. http://hg.ocaml.info/release/sexplib/ provides automatic ways to convert any ocaml value into a sexp). The extension is still pretty new but we use it successfully on a daily basis. After we have upgraded to 9.x we will pack it as an extension and releast it opensource. API wise the module at the moment offers the following: sexp_validate(text) returns boolean Validate that the passed in text is a valid s expression. create domain sexp as text check (sexp_validate(value)); BTW: It is a PITA that arrays of domains are not valid types. And several functions to manipulate take apart sexp's or modify sexp's using a path into the sexp (similar to what xpath does for xml). Such as: sexp_get(sexp, text) returns sexp Get the sub sexp of sexp identified by the path. Returns NULL if path is not a valid path in sexp. Example: path=.a space.b.[1].x ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) ))) - The Value And sexp_get_atom(sexp, text) returns text Get the sub atom of sexp identified by the path. Returns NULL if path is not a valid path in sexp or does not identify an atom. Example: path=.a space.b.[1].x ((ignore this) (a space ((b (0 ((also ignored) (x The Value)) )) ))) ^ - The Value Cheers, Bene On 20/12/11 19:39, Claes Jakobsson wrote: On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote: On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures? if b) then this might get a lot more interesting Cheers, Claes -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Agreed. I do agree with Heikki that it really ought to be the OS problem, but then we thought that about dtrace and we're still waiting for that or similar to be usable on all platforms (+/- 4 years). My point is that it looks like this is going to take 1-2 years in postgresql, so it looks like a huge job... but at the same time I understand we can't hope other filesystems will catch up! I guess this feature will be tunable (off/on)? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
* David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time significantly because it avoids read-modify-write cycles with a cold cache? -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Real-life range datasets
Hello, We have a table in a postgres 8.4 database that would make use of date ranges and exclusion constraints if they were available. Sadly I cannot give you the data as it is based on data we are paying for and as part of the relevant licenses we are obliqued to not give the data to third parties. Basically the tables keep meta data about financial instruments on a given day. Thanks to corporate actions (companies merging, splitting up, etc...) that meta data can be different from one day to the next. One way to model such a table is: identifier, date, payload columns... unique index on (date, identifier) (and in fact some of the payload columns are unique per day indices as well). But because there are a large number of rows per day and most don't change this is a very wasteful representation. Instead we use this identifier, effective_from, effective_until, payload columns... And we have some clever plpgsql functions that merge a days snapshot into that representation. That happens only a few times per day and is currently quite slow mostly because a lot of time is spend in validation triggers to check that there are no overlapping entries for effective_from,effective_until for jane_symbol and a few other identifiers. The most common operations are: Get all or most rows of a given day (select ... from instruments where :date between effective_from and effective_until) left join of the instruments (again in the normal case constrained to one day but in same cases periods of a week or a few month) select ... from t left join instruments on t.jane_symbol = instruments.jane_symbol t.date between instruments.effective_from and t.effective_until where t.date = X and additional constraint on the number of rows from t With t a huge table clustered on date with roughly 500,000 to 2,000,000 entries per day. The left join would work most of the time (my guess is more than 90%). But there are entries in t where the jane_symbol would not be in instruments (sadly). Current size (immediately after a cluster): table toast(all indices) total | 1268 MB | 900 MB | 693 MB| 2861 MB = select min(effective_from), max(effective_from) from instruments; min |max + 2011-05-30 | 2011-12-21 (1 row) b= select count(*) from instruments where current_date - 1 between effective_from and effective_until ; count 358741 (1 row) I should be able to give you a table with the same characteristics as the instruments table but bogus data by replacing all entries in the table with random strings of the same length or something like that. I can probably take a little bit of time during this or the next week to generate such fake real world data ;-) Is there an ftp site to upload the gzipped pg_dump file to? Cheers, Bene On 20/12/11 16:48, Alexander Korotkov wrote: Hackers, For better GiST indexing of range types it's important to have real-life datasets for testing on. Real-life range datasets would help to proof (or reject) some concepts and get more realistic benchmarks. Also, it would be nice to know what queries you expect to run fast on that datasets. Ideally it should be real-life set of queries, but it also could be your presentation of what are typical queries for such datasets. Thanks! - With best regards, Alexander Korotkov. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 7:44 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 22.12.2011 01:43, Tom Lane wrote: A utility to bump the page version is equally a whole lot easier said than done, given that the new version has more overhead space and thus less payload space than the old. What does it do when the old page is too full to be converted? Move some data somewhere else might be workable for heap pages, but I'm less sanguine about rearranging indexes like that. At the very least it would imply that the utility has full knowledge about every index type in the system. Remembering back the old discussions, my favorite scheme was to have an online pre-upgrade utility that runs on the old cluster, moving things around so that there is enough spare room on every page. It would do normal heap updates to make room on heap pages (possibly causing transient serialization failures, like all updates do), and split index pages to make room on them. Yes, it would need to know about all index types. And it would set a global variable to indicate that X bytes must be kept free on all future updates, too. Once the pre-upgrade utility has scanned through the whole cluster, you can run pg_upgrade. After the upgrade, old page versions are converted to new format as pages are read in. The conversion is staightforward, as there the pre-upgrade utility ensured that there is enough spare room on every page. That certainly works, but we're still faced with pg_upgrade rewriting every page, which will take a significant amount of time and with no backout plan or rollback facility. I don't like that at all, hence why I think we need an online upgrade facility if we do have to alter page headers. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 8:42 AM, Florian Weimer fwei...@bfk.de wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time significantly because it avoids read-modify-write cycles with a cold cache? It's way too late in the cycle to suggest removing full page writes or code them. We're looking to add protection, not swap out existing ones. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On 2011-12-22 09:42, Florian Weimer wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time significantly because it avoids read-modify-write cycles with a cold cache What is the downsides of having full_page_writes enabled .. except from log-volume? The manual mentions something about speed, but it is a bit unclear where that would come from, since the full pages must be somewhere in memory when being worked on anyway,. Anyway, I have an archive_command that looks like: archive_command = 'test ! -f /data/wal/%f.gz gzip --fast %p /data/wal/%f.gz' It brings on along somewhere between 50 and 75% reduction in log-volume with no cost on the production system (since gzip just occupices one of the many cores on the system) and can easily keep up even during quite heavy writes. Recovery is a bit more tricky, because hooking gunzip into the command there will cause the system to replay log, gunzip, read data, replay log cycle where the gunzip easily could be done on the other logfiles while replay are being done on one. So a straightforward recovery will cost in recovery time, but that can be dealt with. Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Simon Riggs wrote: So overall, I do now think its still possible to add an optional checksum in the 9.2 release and am willing to pursue it unless there are technical objections. Just to restate Simon's proposal, to make sure I'm understanding it, we would support a new page header format number and the old one in 9.2, both to be the same size and carefully engineered to minimize what code would need to be aware of the version. PageHeaderIsValid() and PageInit() certainly would, and we would need some way to set, clear (maybe), and validate a CRC. We would need a GUC to indicate whether to write the CRC, and if present we would always test it on read and treat it as a damaged page if it didn't match. (Perhaps other options could be added later, to support recovery attempts, but let's not complicate a first cut.) This whole idea would depend on either (1) trusting your storage system never to tear a page on write or (2) getting the double-write feature added, too. I see some big advantages to this over what I suggested to David. For starters, using a flag bit and putting the CRC somewhere other than the page header would require that each AM deal with the CRC, exposing some function(s) for that. Simon's idea doesn't require that. I was also a bit concerned about shifting tuple images to convert non-protected pages to protected pages. No need to do that, either. With the bit flags, I think there might be some cases where we would be unable to add a CRC to a converted page because space was too tight; that's not an issue with Simon's proposal. Heikki was talking about a pre-convert tool. Neither approach really needs that, although with Simon's approach it would be possible to have a background *post*-conversion tool to add CRCs, if desired. Things would continue to function if it wasn't run; you just wouldn't have CRC protection on pages not updated since pg_upgrade was run. Simon, does it sound like I understand your proposal? Now, on to the separate-but-related topic of double-write. That absolutely requires some form of checksum or CRC to detect torn pages, in order for the technique to work at all. Adding a CRC without double-write would work fine if you have a storage stack which prevents torn pages in the file system or hardware driver. If you don't have that, it could create a damaged page indication after a hardware or OS crash, although I suspect that would be the exception, not the typical case. Given all that, and the fact that it would be cleaner to deal with these as two separate patches, it seems the CRC patch should go in first. (And, if this is headed for 9.2, *very soon*, so there is time for the double-write patch to follow.) It seems to me that the full_page_writes GUC could become an enumeration, with off having the current meaning, wal meaning what on now does, and double meaning that the new double-write technique would be used. (It doesn't seem to make any sense to do both at the same time.) I don't think we need a separate GUC to tell us *what* to protect against torn pages -- if not off we should always protect the first write of a page after checkpoint, and if double and write_page_crc (or whatever we call it) is on, then we protect hint-bit-only writes. I think. I can see room to argue that with CRCs on we should do a full-page write to the WAL for a hint-bit-only change, or that we should add another GUC to control when we do this. I'm going to take a shot at writing a patch for background hinting over the holidays, which I think has benefit alone but also boosts the value of these patches, since it would reduce double-write activity otherwise needed to prevent spurious error when using CRCs. This whole area has some overlap with spreading writes, I think. The double-write approach seems to count on writing a bunch of pages (potentially from different disk files) sequentially to the double-write buffer, fsyncing that, and then writing the actual pages -- which must be fsynced before the related portion of the double-write buffer can be reused. The simple implementation would be to simply fsync the files just written to if they required a prior write to the double-write buffer, although fancier techniques could be used to try to optimize that. Again, setting hint bits set before the write when possible would help reduce the impact of that. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 4:00 AM, Jesper Krogh jes...@krogh.cc wrote: On 2011-12-22 09:42, Florian Weimer wrote: * David Fetter: The issue is that double writes needs a checksum to work by itself, and page checksums more broadly work better when there are double writes, obviating the need to have full_page_writes on. How desirable is it to disable full_page_writes? Doesn't it cut down recovery time significantly because it avoids read-modify-write cycles with a cold cache What is the downsides of having full_page_writes enabled .. except from log-volume? The manual mentions something about speed, but it is a bit unclear where that would come from, since the full pages must be somewhere in memory when being worked on anyway,. I thought I will share some of my perspective on this checksum + doublewrite from a performance point of view. Currently from what I see in our tests based on dbt2, DVDStore, etc is that checksum does not impact scalability or total throughput measured. It does increase CPU cycles depending on the algorithm used by not really anything that causes problems. The Doublewrite change will be the big win to performance compared to full_page_write. For example compared to other databases our WAL traffic is one of the highest. Most of it is attributed to full_page_write. The reason full_page_write is necessary in production (atleast without worrying about replication impact) is that if a write fails, we can recover that whole page from WAL Logs as it is and just put it back out there. (In fact I believe thats the recovery does.) However the net impact is during high OLTP the runtime impact on WAL is high due to the high traffic and compared to other databases due to the higher traffic, the utilization is high. Also this has a huge impact on transaction response time the first time a page is changed which in all OLTP environments it is huge because by nature the transactions are all on random pages. When we use Doublewrite with checksums, we can safely disable full_page_write causing a HUGE reduction to the WAL traffic without loss of reliatbility due to a write fault since there are two writes always. (Implementation detail discussable). Since the double writes itself are sequential bundling multiple such writes further reduces the write time. The biggest improvement is that now these writes are not done during TRANSACTION COMMIT but during CHECKPOINT WRITES which improves performance drastically for OLTP application's transaction performance and you still get the reliability that is needed. Typically Performance in terms of throughput tps system is like tps(Full_page Write) tps (no full page write) With the double write and CRC we see tps (Full_page_write) tps (Doublewrite) tps(no full page write) Which is a big win for production systems to get the reliability of full_page write. Also the side effect for response times is that they are more leveled unlike full page write where the response time varies like 0.5ms to 5ms depending on whether the same transaction needs to write a full page onto WAL or not. With doublewrite it can always be around 0.5ms rather than have a huge deviation on transaction performance. With this folks measuring the 90 %ile response time will see a huge relief on trying to meet their SLAs. Also from WAL perspective, I like to put the WAL on its own LUN/spindle/VMDK etc .. The net result that I have with the reduced WAL traffic, my utilization drops which means the same hardware can now handle higher WAL traffic in terms of IOPS resulting that WAL itself becomes lesser of a bottleneck. Typically this is observed by the reduction in response times of the transactions and increase in tps till some other bottleneck becomes the gating factor. So overall this is a big win. Regards, Jignesh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typed hstore proposal
Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes: I mean to create a typed hstore, called tstore for now. Um ... what is the point of this, exactly? From what I've seen, most applications for hstore are pretty happy with the fact that hstore is only weakly typed, and if an entry *is* an integer, or a float, or whatever else, it's not hard to cast to and from text as needed. So this idea looks like a solution in search of a problem, which is going to need a whole lot more work before it even gets to the point of being as useful as hstore. It's not for instance apparent what is the use of iterating over only entries that were supplied as integers --- there is no reason to think that they're related just because of that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Real-life range datasets
Bene, we have pgfoundry project http://pgfoundry.org/projects/dbsamples/. Since your sample database is very important (for me also), I suggest to use this site. Oleg On Thu, 22 Dec 2011, Benedikt Grundmann wrote: Hello, We have a table in a postgres 8.4 database that would make use of date ranges and exclusion constraints if they were available. Sadly I cannot give you the data as it is based on data we are paying for and as part of the relevant licenses we are obliqued to not give the data to third parties. Basically the tables keep meta data about financial instruments on a given day. Thanks to corporate actions (companies merging, splitting up, etc...) that meta data can be different from one day to the next. One way to model such a table is: identifier, date, payload columns... unique index on (date, identifier) (and in fact some of the payload columns are unique per day indices as well). But because there are a large number of rows per day and most don't change this is a very wasteful representation. Instead we use this identifier, effective_from, effective_until, payload columns... And we have some clever plpgsql functions that merge a days snapshot into that representation. That happens only a few times per day and is currently quite slow mostly because a lot of time is spend in validation triggers to check that there are no overlapping entries for effective_from,effective_until for jane_symbol and a few other identifiers. The most common operations are: Get all or most rows of a given day (select ... from instruments where :date between effective_from and effective_until) left join of the instruments (again in the normal case constrained to one day but in same cases periods of a week or a few month) select ... from t left join instruments on t.jane_symbol = instruments.jane_symbol t.date between instruments.effective_from and t.effective_until where t.date = X and additional constraint on the number of rows from t With t a huge table clustered on date with roughly 500,000 to 2,000,000 entries per day. The left join would work most of the time (my guess is more than 90%). But there are entries in t where the jane_symbol would not be in instruments (sadly). Current size (immediately after a cluster): table toast(all indices) total | 1268 MB | 900 MB | 693 MB| 2861 MB = select min(effective_from), max(effective_from) from instruments; min |max + 2011-05-30 | 2011-12-21 (1 row) b= select count(*) from instruments where current_date - 1 between effective_from and effective_until ; count 358741 (1 row) I should be able to give you a table with the same characteristics as the instruments table but bogus data by replacing all entries in the table with random strings of the same length or something like that. I can probably take a little bit of time during this or the next week to generate such fake real world data ;-) Is there an ftp site to upload the gzipped pg_dump file to? Cheers, Bene On 20/12/11 16:48, Alexander Korotkov wrote: Hackers, For better GiST indexing of range types it's important to have real-life datasets for testing on. Real-life range datasets would help to proof (or reject) some concepts and get more realistic benchmarks. Also, it would be nice to know what queries you expect to run fast on that datasets. Ideally it should be real-life set of queries, but it also could be your presentation of what are typical queries for such datasets. Thanks! - With best regards, Alexander Korotkov. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typed hstore proposal
On 22/12/11 10:44, Tom Lane wrote: Johann 'Myrkraverk' Oskarsson joh...@2ndquadrant.com writes: I mean to create a typed hstore, called tstore for now. Um ... what is the point of this, exactly? From what I've seen, most applications for hstore are pretty happy with the fact that hstore is only weakly typed, and if an entry *is* an integer, or a float, or whatever else, it's not hard to cast to and from text as needed. More over it is also easy with the current hstore to add constraints like this: contracts_is_an_integer CHECK ((tags - 'contracts'::text) ~ '^[0-9]+$'::text) to ensure that it actually is. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Jignesh Shah jks...@gmail.com wrote: When we use Doublewrite with checksums, we can safely disable full_page_write causing a HUGE reduction to the WAL traffic without loss of reliatbility due to a write fault since there are two writes always. (Implementation detail discussable). The always there surprised me. It seemed to me that we only need to do the double-write where we currently do full page writes or unlogged writes. In thinking about your message, it finally struck me that this might require a WAL record to be written with the checksum (or CRC; whatever we use). Still, writing a WAL record with a CRC prior to the page write would be less data than the full page. Doing double-writes instead for situations without the torn page risk seems likely to be a net performance loss, although I have no benchmarks to back that up (not having a double-write implementation to test). And if we can get correct behavior without doing either (the checksum WAL record or the double-write), that's got to be a clear win. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLOG contention
On Thu, Dec 22, 2011 at 1:04 AM, Simon Riggs si...@2ndquadrant.com wrote: I understand why you say that and take no offence. All I can say is last time I has access to a good test rig and well structured reporting and analysis I was able to see evidence of what I described to you here. I no longer have that access, which is the main reason I've not done anything in the last few years. We both know you do have good access and that's the main reason I'm telling you about it rather than just doing it myself. Right. But I need more details. If I know what to test and how to test it, I can do it. Otherwise, I'm just guessing. I dislike guessing. You mentioned latency so this morning I ran pgbench with -l and graphed the output. There are latency spikes every few seconds. I'm attaching the overall graph as well as the graph of the last 100 seconds, where the spikes are easier to see clearly. Now, here's the problem: it seems reasonable to hypothesize that the spikes are due to CLOG page replacement since the frequency is at least plausibly right, but this is obviously not enough to prove that conclusively. Ideas? Also, if it is that, what do we do about it? I don't think any of the ideas proposed so far are going to help much. Increasing the number of CLOG buffers isn't going to fix the problem that once they're all dirty, you have to write and fsync one before pulling in the next one. Striping might actually make it worse - everyone will move to the next buffer right around the same time, and instead of everybody waiting for one fsync, they'll each be waiting for their own. Maybe the solution is to have the background writer keep an eye on how many CLOG buffers are dirty and start writing them out if the number gets too big. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company attachment: latency.pngattachment: latency-end.png -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typed hstore proposal
On 12/22/2011 10:44 AM, Tom Lane wrote: Johann 'Myrkraverk' Oskarssonjoh...@2ndquadrant.com writes: I mean to create a typed hstore, called tstore for now. Um ... what is the point of this, exactly? From what I've seen, most applications for hstore are pretty happy with the fact that hstore is only weakly typed, and if an entry *is* an integer, or a float, or whatever else, it's not hard to cast to and from text as needed. So this idea looks like a solution in search of a problem, which is going to need a whole lot more work before it even gets to the point of being as useful as hstore. It's not for instance apparent what is the use of iterating over only entries that were supplied as integers --- there is no reason to think that they're related just because of that. Yeah, the thing that's annoying in some cases about hstore is not that it's untyped but that it's flat. That's what a JSON type would buy us, a lightweight tree structured type, and moreover one that is widely and increasingly used and well understood. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wishlist: parameterizable types
On Thu, Dec 22, 2011 at 1:04 AM, Joey Adams joeyadams3.14...@gmail.com wrote: What I'm wondering is: how complex would it be to add such a feature to PostgreSQL's type system? Very. It's been discussed before, although I can't tell you the subject lines of the threads off the top of my head. The basic problem is that there is a ton of code that assumes that a type is fully identified by an OID, and all of that code would need to be changed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every
On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote: PS: It seems that the min/max optimization isn't documented in the manual (apart from release notes), so I didn't include any doc changes in this patch. I don't see a patch attached to this email, so either you forgot to attach it, or the list ate it somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade with plpython is broken
On Mon, Dec 19, 2011 at 10:16 AM, Peter Eisentraut pete...@gmx.net wrote: Upgrading an instance containing plpython from =8.4 to =9.0 is broken because the module plpython.so was renamed to plpython2.so, and so the pg_upgrade check for loadable libraries fails thus: Your installation references loadable libraries that are missing from the new installation. etc. Installing a symlink fixes the issue. Should we teach pg_upgrade about this renaming, or should we install the symlink as part of the standard installation? I feel like this is a pg_upgrade bug, not so much a PL/python problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch: bytea_agg
On Wed, Dec 21, 2011 at 5:04 AM, Pavel Stehule pavel.steh...@gmail.com wrote: this patch adds a bytea_agg aggregation. It allow fast bytea concatetation. Looks fine to me. I'll commit this, barring objections. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every
On Thu, Dec 22, 2011 at 18:41, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote: PS: It seems that the min/max optimization isn't documented in the manual (apart from release notes), so I didn't include any doc changes in this patch. I don't see a patch attached to this email, so either you forgot to attach it, or the list ate it somehow. I forgot to attach it, sorry. Here it is. Regards, Marti diff --git a/src/include/catalog/pg_aggregate.h b/src/include/catalog/pg_aggregate.h index 26966d2..29e262f 100644 --- a/src/include/catalog/pg_aggregate.h +++ b/src/include/catalog/pg_aggregate.h @@ -203,9 +203,9 @@ DATA(insert ( 2828 float8_regr_accum float8_covar_samp 0 1022 {0,0,0,0,0,0} ) DATA(insert ( 2829 float8_regr_accum float8_corr0 1022 {0,0,0,0,0,0} )); /* boolean-and and boolean-or */ -DATA(insert ( 2517 booland_statefunc - 0 16 _null_ )); -DATA(insert ( 2518 boolor_statefunc - 0 16 _null_ )); -DATA(insert ( 2519 booland_statefunc - 0 16 _null_ )); +DATA(insert ( 2517 booland_statefunc - 58 16 _null_ )); +DATA(insert ( 2518 boolor_statefunc - 59 16 _null_ )); +DATA(insert ( 2519 booland_statefunc - 58 16 _null_ )); /* bitwise integer */ DATA(insert ( 2236 int2and - 0 21 _null_ )); diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index 51ab6e5..7d245d2 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -774,16 +774,19 @@ WHERE a.aggfnoid = p.oid AND (0 rows) -- Cross-check aggsortop (if present) against pg_operator. --- We expect to find only for min and for max. +-- We expect to find for min/bool_and/every and for max/bool_or SELECT DISTINCT proname, oprname FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid ORDER BY 1; - proname | oprname --+- - max | - min | -(2 rows) + proname | oprname +--+- + bool_and | + bool_or | + every| + max | + min | +(5 rows) -- Check datatypes match SELECT a.aggfnoid::oid, o.oid @@ -816,11 +819,14 @@ WHERE a.aggfnoid = p.oid AND a.aggsortop = o.oid AND amopopr = o.oid AND amopmethod = (SELECT oid FROM pg_am WHERE amname = 'btree') ORDER BY 1, 2; - proname | oprname | amopstrategy --+-+-- - max ||5 - min ||1 -(2 rows) + proname | oprname | amopstrategy +--+-+-- + bool_and ||1 + bool_or ||5 + every||1 + max ||5 + min ||1 +(5 rows) -- Check that there are not aggregates with the same name and different -- numbers of arguments. While not technically wrong, we have a project policy diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql index e29148f..bcd8544 100644 --- a/src/test/regress/sql/opr_sanity.sql +++ b/src/test/regress/sql/opr_sanity.sql @@ -626,7 +626,7 @@ WHERE a.aggfnoid = p.oid AND NOT binary_coercible(p.proargtypes[0], a.aggtranstype); -- Cross-check aggsortop (if present) against pg_operator. --- We expect to find only for min and for max. +-- We expect to find for min/bool_and/every and for max/bool_or SELECT DISTINCT proname, oprname FROM pg_operator AS o, pg_aggregate AS a, pg_proc AS p -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On Mon, Dec 19, 2011 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote: On Dec 12, you said It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. Not wanting to leave anyone out in the cold, I proposed something to enhance file based replication also. Fair enough. I am still of the opinion that we ought to commit some version of the pg_last_xact_insert_timestamp patch. I accept that patch isn't going to solve every problem, but I still think it's worth having. If one of these other solutions comes along and turns out to work great, that's fine, too; but I don't think any of them are so compelling that we can credibly say that pg_last_xact_insert_timestamp is useless or obsolete. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wishlist: parameterizable types
On 12/22/2011 11:34 AM, Robert Haas wrote: On Thu, Dec 22, 2011 at 1:04 AM, Joey Adamsjoeyadams3.14...@gmail.com wrote: What I'm wondering is: how complex would it be to add such a feature to PostgreSQL's type system? Very. It's been discussed before, although I can't tell you the subject lines of the threads off the top of my head. The basic problem is that there is a ton of code that assumes that a type is fully identified by an OID, and all of that code would need to be changed. And frankly I'd expect there to be serious resistance to such a change. It's not just a question of how hard it would be, but how advisable. We have an extremely rich and uniquely extensible type system, and we've added a number of somewhat generic features over the years. I'd need a lot of convincing that we need fully generic types (and I came from the Ada world many years ago which was the inspiration for many such features). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every
On Thu, Dec 22, 2011 at 11:52 AM, Marti Raudsepp ma...@juffo.org wrote: On Thu, Dec 22, 2011 at 18:41, Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 5:16 AM, Marti Raudsepp ma...@juffo.org wrote: PS: It seems that the min/max optimization isn't documented in the manual (apart from release notes), so I didn't include any doc changes in this patch. I don't see a patch attached to this email, so either you forgot to attach it, or the list ate it somehow. I forgot to attach it, sorry. Here it is. Nice. It doesn't get much simpler than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wishlist: parameterizable types
* Joey Adams (joeyadams3.14...@gmail.com) wrote: This may be ambitious, but it'd be neat if PostgreSQL supported parameterizable types. For example, suppose a contrib module defines a pair type. It could be used as follows: Have you looked at what the PostGIS folks are doing..? We do have support for custom typmod's... I'm not sure if that helps you at all, but figured I'd mention it. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Indexes with duplicate columns
In bug #6351 it's pointed out that this fails unexpectedly: CREATE TABLE tab (id SERIAL, a INTEGER, b INTEGER); CREATE INDEX tab123 ON tab (a, b, a); SELECT a, b FROM tab WHERE a = 0 AND b = 1; ERROR: btree index keys must be ordered by attribute I looked into this a bit and find that indxpath.c is producing a correct list of index quals, a = 0 AND b = 1 AND a = 0, but then createplan.c messes it up because it matches both copies of a to the first possible match in the index's column list. So what the executor gets looks like {INDEX_VAR 1} = 0 AND {INDEX_VAR 2} = 1 AND {INDEX_VAR 1} = 0 and there's a btree implementation restriction that makes it spit up on that. Now, what the planner did here is more wrong than just tripping over a btree limitation. The actual use-case for an index mentioning the same table column more than once, IMO, would be if the index columns had different operator classes and thus supported different sets of indexable operators. Matching the second instance of a to the first index column could then be asking the index to implement an operator that that column doesn't support. So we need to fix the planner not btree. The representation that indxpath.c actually emits is correct and unambiguous, because it produces a list of sublists of indexquals, one sublist per index column. So in that format it's clear which index column each clause is meant for. But then we flatten the list in create_index_path, and so createplan.c has to reverse-engineer the matching, and it really can't get it right unless we're willing to make it recheck operator matching not only column matching. The obvious solution seems to be to preserve the list-of-sublists representation through to createplan.c. Then that code can just verify the expected column match instead of searching, so it might actually be a tad faster. However such a change is going to affect cost_index and all the amcostestimate functions, and likely break any planner plugins that do anything with IndexPaths. So it's going to be a bit invasive and I'm inclined to think it's not back-patchable. My inclination is to fix it that way in HEAD and just leave it as a known unsupported case in the back branches. This case does not seem important enough to justify trying to find a solution that would work without a path representation change. Comments, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstan and...@dunslane.net wrote: The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. With regard to (a), specifically, you won't like this change if your column names are things like bob and sam, because you'll burn through an inordinate amount of vertical space. On the other hand, I agree that you'll probably find it a big improvement if they are things like nc.nspname::information_schema.sql_identifier as udt_schema. It has always seemed to me that a sensible strategy here would be to try to produce output that looks good in 80 columns, on the assumption that (1) everyone has at least that much horizontal space to play with and (2) people who do won't necessarily mind it if we don't use all of that horizontal space when pretty-printing SQL. However, it is possible that I am living in the dark ages. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On 12/22/2011 12:18 PM, Robert Haas wrote: On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net wrote: The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. With regard to (a), specifically, you won't like this change if your column names are things like bob and sam, because you'll burn through an inordinate amount of vertical space. On the other hand, I agree that you'll probably find it a big improvement if they are things like nc.nspname::information_schema.sql_identifier as udt_schema. It has always seemed to me that a sensible strategy here would be to try to produce output that looks good in 80 columns, on the assumption that (1) everyone has at least that much horizontal space to play with and (2) people who do won't necessarily mind it if we don't use all of CASE WHEN nco.nspname IS NOT NULL THEN current_database() ELSE NULL::name END::information_schema.sql_identifier AS collation_catalog, nco.nspname::information_schema.sql_identifier AS collation_schema, I'd still be much happier with a that horizontal space when pretty-printing SQL. However, it is possible that I am living in the dark ages. I've looked at that, and it was discussed a bit previously. It's more complex because it requires that we keep track of (or calculate) where we are on the line, and where we would be after adding the new text. But I could live with it if others could. It would certainly be an improvement. But that's still going to leave things that will look odd, such as a CASE expression's final line being filled up to 80 columns with more fields. My preference would be for a newline as a visual clue to where each column spec starts. I used to try to be conservative about vertical space, but in these days of scrollbars and screens not limited to 24 or 25 lines (Yes, kids, that's what some of us grew up with) that seems a bit old-fashioned. One of the arguments for KR style braces used to be that it used one less line than BSD style. Maybe that made some sense 20 or so years ago, although I didn't really buy it even then, but it makes very little sense to me now. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Real-life range datasets
On Dec 22, 2011, at 7:48 AM, Oleg Bartunov wrote: we have pgfoundry project http://pgfoundry.org/projects/dbsamples/. Since your sample database is very important (for me also), I suggest to use this site. Or PGXN. http://pgxn.org/ You can register an account to upload extensions like you describe here: http://manager.pgxn.org/account/register Details on what’s required to distribute on PGXN are here: http://manager.pgxn.org/howto Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On 12/22/2011 12:52 PM, Andrew Dunstan wrote: On 12/22/2011 12:18 PM, Robert Haas wrote: On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net wrote: The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. With regard to (a), specifically, you won't like this change if your column names are things like bob and sam, because you'll burn through an inordinate amount of vertical space. On the other hand, I agree that you'll probably find it a big improvement if they are things like nc.nspname::information_schema.sql_identifier as udt_schema. It has always seemed to me that a sensible strategy here would be to try to produce output that looks good in 80 columns, on the assumption that (1) everyone has at least that much horizontal space to play with and (2) people who do won't necessarily mind it if we don't use all ofCASE WHEN nco.nspname IS NOT NULL THEN current_database() ELSE NULL::name END::information_schema.sql_identifier AS collation_catalog, nco.nspname::information_schema.sql_identifier AS collation_schema, I'd still be much happier with a that horizontal space when pretty-printing SQL. However, it is possible that I am living in the dark ages. I've looked at that, and it was discussed a bit previously. It's more complex because it requires that we keep track of (or calculate) where we are on the line, and where we would be after adding the new text. But I could live with it if others could. It would certainly be an improvement. But that's still going to leave things that will look odd, such as a CASE expression's final line being filled up to 80 columns with more fields. My preference would be for a newline as a visual clue to where each column spec starts. I used to try to be conservative about vertical space, but in these days of scrollbars and screens not limited to 24 or 25 lines (Yes, kids, that's what some of us grew up with) that seems a bit old-fashioned. One of the arguments for KR style braces used to be that it used one less line than BSD style. Maybe that made some sense 20 or so years ago, although I didn't really buy it even then, but it makes very little sense to me now. Wow. My editor or my fingers seem to have screwed up here. Something got CP'd into the middle of the quoted text that shouldn't have. *sigh* cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping : in .pgpass - code or docs bug?
On Mon, Dec 19, 2011 at 2:42 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Yeah, it is a pretty old bug -- this code was clearly written by some rookie that didn't know very well what he was doing. Hey, I got that joke. I fixed this in master. I'm not going to bother with anything else unless someone else feels motivated to make it happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On Thu, Dec 22, 2011 at 12:52 PM, Andrew Dunstan and...@dunslane.net wrote: I used to try to be conservative about vertical space, but in these days of scrollbars and screens not limited to 24 or 25 lines (Yes, kids, that's what some of us grew up with) that seems a bit old-fashioned. One of the arguments for KR style braces used to be that it used one less line than BSD style. Maybe that made some sense 20 or so years ago, although I didn't really buy it even then, but it makes very little sense to me now. I *still* spend a lot of time editing in a 25x80 window. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
Robert Haas robertmh...@gmail.com writes: On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstan and...@dunslane.net wrote: The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. With regard to (a), specifically, you won't like this change if your column names are things like bob and sam, because you'll burn through an inordinate amount of vertical space. Yeah. I'm not exactly thrilled with (b), either, if it's a consequence of a change whose only excuse for living is to make the output look nicer. Random extra newlines don't look nicer to me. It has always seemed to me that a sensible strategy here would be to try to produce output that looks good in 80 columns, Maybe, though I fear it might complicate the ruleutils code a bit. You'd probably have to build the output for a column first and then see how long it is before deciding whether to insert a newline. In short, I don't mind trying to make this work better, but I think it will take more work than a two-line patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote: Is SnapshotAny the snapshot I should be using? It seems to get the correct results. I can drop a table and I get NULL. Then after a vacuumdb it returns an error. The suggestion on the original thread was to use SnapshotDirty or the caller's MVCC snapshot. SnapshotAny doesn't seem like a good idea. We don't want to include rows from, say, transactions that have rolled back. And SnapshotAny rows can stick around for a long time - weeks, months. If a table is only occasionally updated, autovacuum may not process it for a long time. On the other hand, I think using SnapshotDirty is going to work out so well: isn't there a race condition? The caller's MVCC snapshot seems better, but I still see pitfalls. Who is to say that the immediate caller has the same snapshot as the scan? I'm thinking of cases involving things like CTEs and nested function calls. I'm wondering if we oughta just return NULL and be done with it. If SELECT select 1241241241::regclass doesn't choke, I'm not sure select pg_relation_size(1241241241) ought to either. It's not like the user is going to see NULL and have no clue that the relation wasn't found. At the very worst they might think it's empty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
Robert Haas robertmh...@gmail.com writes: I'm wondering if we oughta just return NULL and be done with it. +1. There are multiple precedents for that sort of response, which we introduced exactly so that SELECT some_function(oid) FROM some_catalog wouldn't fail just because one of the rows had gotten deleted by the time the scan got to it. I don't think it's necessary for the relation-size functions to be any smarter. Indeed, I'd assumed that's all that Phil's patch did, since I'd not looked closer till just now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 11:16 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Jignesh Shah jks...@gmail.com wrote: When we use Doublewrite with checksums, we can safely disable full_page_write causing a HUGE reduction to the WAL traffic without loss of reliatbility due to a write fault since there are two writes always. (Implementation detail discussable). The always there surprised me. It seemed to me that we only need to do the double-write where we currently do full page writes or unlogged writes. In thinking about your message, it finally struck Currently PG only does full page write for the first change that makes the dirty after a checkpoint. This scheme works when all changes are relative to that first page so when checkpoint write fails then it can recreate the page by using the full page write + all the delta changes from WAL. In the double write implementation, every checkpoint write is double writed, so if the first doublewrite page write fails then then original page is not corrupted and if the second write to the actual datapage fails, then one can recover it from the earlier write. Now while it seems that there are 2X double writes during checkpoint is true. I can argue that there are the same 2 X writes right now except 1X of the write goes to WAL DURING TRANSACTION COMMIT. Also since doublewrite is generally written in its own file it is essentially sequential so it doesnt have the same write latencies as the actual checkpoint write. So if you look at the net amount of the writes it is the same. For unlogged tables even if you do doublewrite it is not much of a penalty while that may not be logging before in the WAL. By doing the double write for it, it is still safe and gives resilience for those tables to it eventhough it is not required. The net result is that the underlying page is never irrecoverable due to failed writes. me that this might require a WAL record to be written with the checksum (or CRC; whatever we use). Still, writing a WAL record with a CRC prior to the page write would be less data than the full page. Doing double-writes instead for situations without the torn page risk seems likely to be a net performance loss, although I have no benchmarks to back that up (not having a double-write implementation to test). And if we can get correct behavior without doing either (the checksum WAL record or the double-write), that's got to be a clear win. I am not sure why would one want to write the checksum to WAL. As for the double writes, infact there is not a net loss because (a) the writes to the doublewrite area is sequential the writes calls are relatively very fast and infact does not cause any latency increase to any transactions unlike full_page_write. (b) It can be moved to a different location to have no stress on the default tablespace if you are worried about that spindle handling 2X writes which is mitigated in full_page_writes if you move pg_xlogs to different spindle and my own tests supports that the net result is almost as fast as full_page_write=off but not the same due to the extra write (which gives you the desired reliability) but way better than full_page_write=on. Regards, Jignesh -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm wondering if we oughta just return NULL and be done with it. +1. There are multiple precedents for that sort of response, which we introduced exactly so that SELECT some_function(oid) FROM some_catalog wouldn't fail just because one of the rows had gotten deleted by the time the scan got to it. I don't think it's necessary for the relation-size functions to be any smarter. Indeed, I'd assumed that's all that Phil's patch did, since I'd not looked closer till just now. regards, tom lane Here it is without the checking for recently dead. If it can't open the relation it simply returns NULL. diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 2ee5966..134bc03 *** a/src/backend/utils/adt/dbsize.c --- b/src/backend/utils/adt/dbsize.c *** pg_relation_size(PG_FUNCTION_ARGS) *** 289,295 Relation rel; int64 size; ! rel = relation_open(relOid, AccessShareLock); size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); --- 289,298 Relation rel; int64 size; ! rel = try_relation_open(relOid, AccessShareLock); ! ! if (rel == NULL) ! PG_RETURN_NULL(); size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); *** calculate_toast_table_size(Oid toastreli *** 339,352 * those won't have attached toast tables, but they can have multiple forks. */ static int64 ! calculate_table_size(Oid relOid) { int64 size = 0; - Relation rel; ForkNumber forkNum; - rel = relation_open(relOid, AccessShareLock); - /* * heap size, including FSM and VM */ --- 342,352 * those won't have attached toast tables, but they can have multiple forks. */ static int64 ! calculate_table_size(Relation rel) { int64 size = 0; ForkNumber forkNum; /* * heap size, including FSM and VM */ *** calculate_table_size(Oid relOid) *** 360,367 if (OidIsValid(rel-rd_rel-reltoastrelid)) size += calculate_toast_table_size(rel-rd_rel-reltoastrelid); - relation_close(rel, AccessShareLock); - return size; } --- 360,365 *** calculate_table_size(Oid relOid) *** 371,382 * Can be applied safely to an index, but you'll just get zero. */ static int64 ! calculate_indexes_size(Oid relOid) { int64 size = 0; - Relation rel; - - rel = relation_open(relOid, AccessShareLock); /* * Aggregate all indexes on the given relation --- 369,377 * Can be applied safely to an index, but you'll just get zero. */ static int64 ! calculate_indexes_size(Relation rel) { int64 size = 0; /* * Aggregate all indexes on the given relation *** calculate_indexes_size(Oid relOid) *** 405,412 list_free(index_oids); } - relation_close(rel, AccessShareLock); - return size; } --- 400,405 *** Datum *** 414,429 pg_table_size(PG_FUNCTION_ARGS) { Oid relOid = PG_GETARG_OID(0); ! PG_RETURN_INT64(calculate_table_size(relOid)); } Datum pg_indexes_size(PG_FUNCTION_ARGS) { Oid relOid = PG_GETARG_OID(0); ! PG_RETURN_INT64(calculate_indexes_size(relOid)); } /* --- 407,444 pg_table_size(PG_FUNCTION_ARGS) { Oid relOid = PG_GETARG_OID(0); + Relation rel; + int64 size; ! rel = try_relation_open(relOid, AccessShareLock); ! ! if (rel == NULL) ! PG_RETURN_NULL(); ! ! size = calculate_table_size(rel); ! ! relation_close(rel, AccessShareLock); ! ! PG_RETURN_INT64(size); } Datum pg_indexes_size(PG_FUNCTION_ARGS) { Oid relOid = PG_GETARG_OID(0); + Relation rel; + int64 size; ! rel = try_relation_open(relOid, AccessShareLock); ! ! if (rel == NULL) ! PG_RETURN_NULL(); ! ! size = calculate_indexes_size(rel); ! ! relation_close(rel, AccessShareLock); ! ! PG_RETURN_INT64(size); } /* *** pg_indexes_size(PG_FUNCTION_ARGS) *** 431,437 * including heap data, index data, toast data, FSM, VM. */ static int64 ! calculate_total_relation_size(Oid Relid) { int64 size; --- 446,452 * including heap data, index data, toast data, FSM, VM. */ static int64 ! calculate_total_relation_size(Relation Rel) { int64 size; *** calculate_total_relation_size(Oid Relid) *** 439,450 * Aggregate the table size, this includes size of the heap, toast and * toast index with free space and visibility map */ ! size = calculate_table_size(Relid); /* * Add size of all attached indexes as well */ ! size += calculate_indexes_size(Relid); return size; } --- 454,465 * Aggregate the table size, this includes size of the
Re: [HACKERS] reprise: pretty print viewdefs
On 12/22/2011 01:05 PM, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Dec 19, 2011 at 1:51 PM, Andrew Dunstanand...@dunslane.net wrote: The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. With regard to (a), specifically, you won't like this change if your column names are things like bob and sam, because you'll burn through an inordinate amount of vertical space. Yeah. I'm not exactly thrilled with (b), either, if it's a consequence of a change whose only excuse for living is to make the output look nicer. Random extra newlines don't look nicer to me. It has always seemed to me that a sensible strategy here would be to try to produce output that looks good in 80 columns, Maybe, though I fear it might complicate the ruleutils code a bit. You'd probably have to build the output for a column first and then see how long it is before deciding whether to insert a newline. In short, I don't mind trying to make this work better, but I think it will take more work than a two-line patch. OK. Let me whip something up. I had already come to the conclusion you did about how best to do this. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] disable prompting by default in createuser
On lör, 2011-11-26 at 01:28 +0200, Peter Eisentraut wrote: I propose that we change createuser so that it does not prompt for anything by default. We can arrange options so that you can get prompts for whatever is missing, but by default, a call to createuser should just run CREATE USER with default options. The fact that createuser issues prompts is always annoying when you create setup scripts, because you have to be careful to specify all the necessary options, and they are inconsistent and different between versions (although the last change about that was a long time ago), and the whole behavior seems contrary to the behavior of all other utilities. I don't think there'd be a real loss by not prompting by default; after all, we don't really want to encourage users to create more superusers, do we? Comments? Patch attached. I'll add it to the next commitfest. diff --git i/doc/src/sgml/ref/createuser.sgml w/doc/src/sgml/ref/createuser.sgml index 4cbfd69..90b0fd4 100644 --- i/doc/src/sgml/ref/createuser.sgml +++ w/doc/src/sgml/ref/createuser.sgml @@ -102,7 +102,8 @@ PostgreSQL documentation termoption--no-createdb//term listitem para -The new user will not be allowed to create databases. +The new user will not be allowed to create databases. This is the +default. /para /listitem /varlistentry @@ -153,6 +154,19 @@ PostgreSQL documentation /varlistentry varlistentry + termoption--interactive//term + listitem + para +Prompt for the user name if none is specified on the command line, and +also prompt for whichever of the +options option-d/option/option-D/option, option-r/option/option-R/option, option-s/option/option-S/option +is not specified on the command line. (This was the default behavior +up to PostgreSQL 9.1.) + /para + /listitem + /varlistentry + + varlistentry termoption-l//term termoption--login//term listitem @@ -215,7 +229,8 @@ PostgreSQL documentation termoption--no-createrole//term listitem para -The new user will not be allowed to create new roles. +The new user will not be allowed to create new roles. This is the +default. /para /listitem /varlistentry @@ -235,7 +250,7 @@ PostgreSQL documentation termoption--no-superuser//term listitem para -The new user will not be a superuser. +The new user will not be a superuser. This is the default. /para /listitem /varlistentry @@ -287,11 +302,6 @@ PostgreSQL documentation /para para - You will be prompted for a name and other missing information if it - is not specified on the command line. - /para - - para applicationcreateuser/application also accepts the following command-line arguments for connection parameters: @@ -422,6 +432,14 @@ PostgreSQL documentation server: screen prompt$ /promptuserinputcreateuser joe/userinput +/screen + /para + + para +To create a user literaljoe/literal on the default database +server with prompting for some additional attributes: +screen +prompt$ /promptuserinputcreateuser --interactive joe/userinput computeroutputShall the new role be a superuser? (y/n) /computeroutputuserinputn/userinput computeroutputShall the new role be allowed to create databases? (y/n) /computeroutputuserinputn/userinput computeroutputShall the new role be allowed to create more new roles? (y/n) /computeroutputuserinputn/userinput @@ -430,7 +448,7 @@ PostgreSQL documentation para To create the same user literaljoe/literal using the -server on host literaleden/, port 5000, avoiding the prompts and +server on host literaleden/, port 5000, with attributes explicitly specified, taking a look at the underlying command: screen prompt$ /promptuserinputcreateuser -h eden -p 5000 -S -D -R -e joe/userinput diff --git i/doc/src/sgml/ref/dropuser.sgml w/doc/src/sgml/ref/dropuser.sgml index 724fe40..bc6feaf 100644 --- i/doc/src/sgml/ref/dropuser.sgml +++ w/doc/src/sgml/ref/dropuser.sgml @@ -62,7 +62,9 @@ PostgreSQL documentation listitem para Specifies the name of the productnamePostgreSQL/productname user to be removed. -You will be prompted for a name if none is specified on the command line. +You will be prompted for a name if none is specified on the command +line and the option-i/option/option--interactive/option option +is used. /para /listitem /varlistentry @@ -83,7 +85,8 @@ PostgreSQL documentation termoption--interactive//term listitem para -Prompt for confirmation before actually removing the user. +Prompt for confirmation before actually removing the user, and prompt +for the user name if none is specified on the command line.
Re: [HACKERS] Escaping : in .pgpass - code or docs bug?
Excerpts from Robert Haas's message of jue dic 22 15:03:36 -0300 2011: On Mon, Dec 19, 2011 at 2:42 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Yeah, it is a pretty old bug -- this code was clearly written by some rookie that didn't know very well what he was doing. Hey, I got that joke. I fixed this in master. I'm not going to bother with anything else unless someone else feels motivated to make it happen. Thanks! :-) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
Excerpts from Nikhil Sontakke's message of mar dic 20 12:03:33 -0300 2011: Apologies, I did not check this particular scenario. I guess, here, we should not allow merging of the inherited constraint into an only constraint. Because that breaks the semantics for only constraints. If this sounds ok, I can whip up a patch for the same. PFA, patch which does just this. postgres=# alter table a add constraint chk check (ff1 0); ERROR: constraint chk for relation b is an ONLY constraint. Cannot merge I think the basic idea is fine -- the constraint certainly cannot be merged, and we can't continue without merging it because of the inconsistency it would create. The error message is wrong though. I suggest ERROR: constraint name %s on relation %s conflicts with non-inherited constraint on relation %s HINT: Specify a different constraint name. The errmsg seems a bit long though -- anybody has a better suggestion? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
On Thu, Dec 22, 2011 at 2:43 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Nikhil Sontakke's message of mar dic 20 12:03:33 -0300 2011: Apologies, I did not check this particular scenario. I guess, here, we should not allow merging of the inherited constraint into an only constraint. Because that breaks the semantics for only constraints. If this sounds ok, I can whip up a patch for the same. PFA, patch which does just this. postgres=# alter table a add constraint chk check (ff1 0); ERROR: constraint chk for relation b is an ONLY constraint. Cannot merge I think the basic idea is fine -- the constraint certainly cannot be merged, and we can't continue without merging it because of the inconsistency it would create. I was just looking at this as well. There is at least one other problem. Consider: rhaas=# create table a (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# create table b (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# alter table b inherit a; ALTER TABLE This needs to fail if chk is an only constraint, but it doesn't, even with this patch. I think that part of the problem here is fuzzy thinking about the meaning of the word ONLY in ALTER TABLE ONLY b. The word ONLY there is really supposed to mean that the operation is performed on b but not on its descendents; but that's not what you're doing: you're really performing a different operation. In theory, for a table with no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to be identical, but here they are not. I think that's probably bad. Another manifestation of this problem is that there's no way to add an ONLY constraint in a CREATE TABLE command. I think that's bad, too: it should be possible to declare any constraint at table creation time, and if the ONLY were part of the constraint definition rather than part of the target-table specification, that would work fine. As it is, it doesn't. I am tempted to say we should revert this and rethink. I don't believe we are only a small patch away from finding all the bugs here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote: In the double write implementation, every checkpoint write is double writed, Unless I'm quite thoroughly confused, which is possible, the double write will need to happen the first time a buffer is written following each checkpoint. Which might mean the next checkpoint, but it could also be sooner if the background writer kicks in, or in the worst case a buffer has to do its own write. Furthermore, we can't *actually* write any pages until they are written *and fsync'd* to the double-write buffer. So the penalty for the background writer failing to do the right thing is going to go up enormously. Think about VACUUM or COPY IN, using a ring buffer and kicking out its own pages. Every time it evicts a page, it is going to have to doublewrite the buffer, fsync it, and then write it for real. That is going to make PostgreSQL 6.5 look like a speed demon. The background writer or checkpointer can conceivably dump a bunch of pages into the doublewrite area and then fsync the whole thing in bulk, but a backend that needs to evict a page only wants one page, so it's pretty much screwed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorber p...@omniti.com wrote: On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm wondering if we oughta just return NULL and be done with it. +1. There are multiple precedents for that sort of response, which we introduced exactly so that SELECT some_function(oid) FROM some_catalog wouldn't fail just because one of the rows had gotten deleted by the time the scan got to it. I don't think it's necessary for the relation-size functions to be any smarter. Indeed, I'd assumed that's all that Phil's patch did, since I'd not looked closer till just now. Here it is without the checking for recently dead. If it can't open the relation it simply returns NULL. I think we probably ought to make pg_database_size() and pg_tablespace_size() behave similarly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 3:04 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 22, 2011 at 1:50 PM, Jignesh Shah jks...@gmail.com wrote: In the double write implementation, every checkpoint write is double writed, Unless I'm quite thoroughly confused, which is possible, the double write will need to happen the first time a buffer is written following each checkpoint. Which might mean the next checkpoint, but it could also be sooner if the background writer kicks in, or in the worst case a buffer has to do its own write. Logically the double write happens for every checkpoint write and it gets fsynced.. Implementation wise you can do a chunk of those pages like we do in sets of pages and sync them once and yes it still performs better than full_page_write. As long as you compare with full_page_write=on, the scheme is always much better. If you compare it with performance of full_page_write=off it is slightly less but then you lose the the reliability. So for performance testers like me who always turn off full_page_write anyway during my benchmark run will not see any impact. However for folks in production who are rightly scared to turn off full_page_write will have an ability to increase performance without being scared on failed writes. Furthermore, we can't *actually* write any pages until they are written *and fsync'd* to the double-write buffer. So the penalty for the background writer failing to do the right thing is going to go up enormously. Think about VACUUM or COPY IN, using a ring buffer and kicking out its own pages. Every time it evicts a page, it is going to have to doublewrite the buffer, fsync it, and then write it for real. That is going to make PostgreSQL 6.5 look like a speed demon. Like I said implementation detail wise it depends on how many such pages do you sync simultaneously and the real tests prove that it is actually much faster than one expects. The background writer or checkpointer can conceivably dump a bunch of pages into the doublewrite area and then fsync the whole thing in bulk, but a backend that needs to evict a page only wants one page, so it's pretty much screwed. Generally what point you pay the penalty is a trade off.. I would argue that you are making me pay for the full page write for my first transaction commit that changes the page which I can never avoid and the result is I get a transaction response time that is unacceptable since the deviation of a similar transaction which modifies the page already made dirty is lot less. However I can avoid page evictions if I select a bigger bufferpool (not necessarily that I want to do that but I have a choice without losing reliability). Regards, Jignesh -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] atexit vs. on_exit
Are there any supported platforms that have only on_exit() but not atexit()? It would be good in some cases to rewrite custom arrangements such as exit_nicely() or die_horribly() using those exit hooks, but supporting both through ifdefs makes the code more ugly than before. I dug around the buildfarm logs and documentation around the internet, and it appears that all non-ancient platforms support atexit(). The use of on_exit() in PostgreSQL source code appears to come from the original Postgres95 code. I would like to get rid of on_exit(). Alternatively, atexit() could be implemented in libpgport as a wrapper around on_exit(). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LibreOffice driver 3: pg_config and linking statically to libpq
On Tue, Dec 13, 2011 at 6:05 AM, Lionel Elie Mamane lio...@mamane.lu wrote: * On the one hand, it gives too much since LIBS is filtered to only a subset in src/interface/libpq/Makefile. What is it excluding that it ought to include? I am not quite clear on why that code is like that, but it appears to be intended that the filter-list include everything that might be needed. * On the other hand, it does not give enough, since it does not give the value of LDAP_LIBS_FE anywhere, nor say if it is necessary to add PTHREAD_LIBS. This is not an immediate problem for LibreOffice: I export the value of SHLIB_EXPORTS from src/interface/libpq/Makefile as a Makefile snippet that gets imported in our build system or (on Microsoft Windows) we just proceeded by trial and error until the link succeeds. However, I suggest it would be cleaner to give that kind of information in pg_config, so that one can basically do something like: $LINK_COMMAND -lpq $(pg_config --libpq-dep-libs) and have it work automatically. You could also provide a pq.pc file for pkgconfig, which would give nice nearly-automatic integration for projects using e.g. autoconf and friends. Care to propose a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Fix Leaky View Problem
On Mon, Dec 12, 2011 at 12:00 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: The v8.option-2 add checks around examine_simple_variable, and prevent to reference statistical data, if Var node tries to reference relation with security-barrier attribute. I adopted this approach, and committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon, does it sound like I understand your proposal? Yes, thanks for restating. Now, on to the separate-but-related topic of double-write. That absolutely requires some form of checksum or CRC to detect torn pages, in order for the technique to work at all. Adding a CRC without double-write would work fine if you have a storage stack which prevents torn pages in the file system or hardware driver. If you don't have that, it could create a damaged page indication after a hardware or OS crash, although I suspect that would be the exception, not the typical case. Given all that, and the fact that it would be cleaner to deal with these as two separate patches, it seems the CRC patch should go in first. (And, if this is headed for 9.2, *very soon*, so there is time for the double-write patch to follow.) It could work that way, but I seriously doubt that a technique only mentioned in dispatches one month before the last CF is likely to become trustable code within one month. We've been discussing CRCs for years, so assembling the puzzle seems much easier, when all the parts are available. It seems to me that the full_page_writes GUC could become an enumeration, with off having the current meaning, wal meaning what on now does, and double meaning that the new double-write technique would be used. (It doesn't seem to make any sense to do both at the same time.) I don't think we need a separate GUC to tell us *what* to protect against torn pages -- if not off we should always protect the first write of a page after checkpoint, and if double and write_page_crc (or whatever we call it) is on, then we protect hint-bit-only writes. I think. I can see room to argue that with CRCs on we should do a full-page write to the WAL for a hint-bit-only change, or that we should add another GUC to control when we do this. I'm going to take a shot at writing a patch for background hinting over the holidays, which I think has benefit alone but also boosts the value of these patches, since it would reduce double-write activity otherwise needed to prevent spurious error when using CRCs. I would suggest you examine how to have an array of N bgwriters, then just slot the code for hinting into the bgwriter. That way a bgwriter can set hints, calc CRC and write pages in sequence on a particular block. The hinting needs to be synchronised with the writing to give good benefit. If we want page checksums in 9.2, I'll need your help, so the hinting may be a sidetrack. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit vs. on_exit
Peter Eisentraut pete...@gmx.net writes: Are there any supported platforms that have only on_exit() but not atexit()? Trolling the git logs shows that configure's support for on_exit was added here: commit df247b821d811abcfc0ac707e1a3af9dfce474c9 Author: Tatsuo Ishii is...@postgresql.org Date: Tue Feb 27 08:13:31 2001 + Massive commits for SunOS4 port. SunOS4 is definitely pretty dead. According to the info I have, atexit() is required by ANSI C (that would be C89 not C99), so it certainly seems unlikely that we'd ever see platforms without it anymore. A slightly different issue is that atexit is less functional than on_exit (the former does not tell the callbacks the exit() code). There is one place in our code where it would be nice to have that. But since we're preferring atexit where available, we seem to be getting along fine without it anyway. +1 for simplifying. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums + Double Writes
Simon Riggs si...@2ndquadrant.com wrote: It could work that way, but I seriously doubt that a technique only mentioned in dispatches one month before the last CF is likely to become trustable code within one month. We've been discussing CRCs for years, so assembling the puzzle seems much easier, when all the parts are available. Well, double-write has been mentioned on the lists for years, sometimes in conjunction with CRCs, and I get the impression this is one of those things which has been worked on out of the community's view for a while and is just being posted now. That's often not viewed as the ideal way for development to proceed from a community standpoint, but it's been done before with some degree of success -- particularly when a feature has been bikeshedded to a standstill. ;-) I would suggest you examine how to have an array of N bgwriters, then just slot the code for hinting into the bgwriter. That way a bgwriter can set hints, calc CRC and write pages in sequence on a particular block. The hinting needs to be synchronised with the writing to give good benefit. I'll think about that. I see pros and cons, and I'll have to see how those balance out after I mull them over. If we want page checksums in 9.2, I'll need your help, so the hinting may be a sidetrack. Well, VMware posted the initial patch, and that was the first I heard of it. I just had some off-line discussions with them after they posted it. Perhaps the engineers who wrote it should take your comments as a review an post a modified patch? It didn't seem like that pot of broth needed any more cooks, so I was going to go work on a nice dessert; but I agree that any way I can help along the either of the $Subject patches should take priority. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typed hstore proposal
Andrew Dunstan and...@dunslane.net writes: On 12/22/2011 10:44 AM, Tom Lane wrote: Johann 'Myrkraverk' Oskarssonjoh...@2ndquadrant.com writes: I mean to create a typed hstore, called tstore for now. Um ... what is the point of this, exactly? From what I've seen, most applications for hstore are pretty happy with the fact that hstore is only weakly typed, and if an entry *is* an integer, or a float, or whatever else, it's not hard to cast to and from text as needed. So this idea looks like a solution in search of a problem, which is going to need a whole lot more work before it even gets to the point of being as useful as hstore. It's not for instance apparent what is the use of iterating over only entries that were supplied as integers --- there is no reason to think that they're related just because of that. No, which is why each( tstore ) returns the whole thing, as tvalues. Also, it can be quite helpful in some cases to ask what is the type of this key rather than cast to an integer and hope it works. Typed in this case means each value is typed. There are (as yet) no reason nor facility to add type constraints for a given key within the implementation itself. Yeah, the thing that's annoying in some cases about hstore is not that it's untyped but that it's flat. As I already pointed out (well, implied) is that it's trivial to allow tstore to be recursive. That's what a JSON type would buy us, a lightweight tree structured type, and moreover one that is widely and increasingly used and well understood. While I have not meant tstore to be a JSON type, it's not hard to make it fully compatible with JSON by providing such input/output functions. Here it's noteworthy that I mean tstore to be richer than JSON. Some type ideas: * boolean * bytea * float4 * float8 * int2 * int4 * int8 * null (or some provision to have unvalued keys) Not all of the above may be supported by the first implementation. Notably bytea may be skipped. And later on, possibly some subset or all of the time types: * timestamp with time zone * timestamp without time zone * interval * date * time with time zone * time without time zone For JSON compatibility and tree structures: * tstore (nested) * tvalue arrays (or another way to have JSON compatible arrays) It might also be worthwhile to have a specific JSON type, possibly using the same underlying structure just with different input/output functions. -- Johann Oskarssonhttp://www.2ndquadrant.com/|[] PostgreSQL Development, 24x7 Support, Training and Services --+-- | Blog: http://my.opera.com/myrkraverk/blog/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On 12/22/2011 02:17 PM, Andrew Dunstan wrote: On 12/22/2011 01:05 PM, Tom Lane wrote: Maybe, though I fear it might complicate the ruleutils code a bit. You'd probably have to build the output for a column first and then see how long it is before deciding whether to insert a newline. In short, I don't mind trying to make this work better, but I think it will take more work than a two-line patch. OK. Let me whip something up. I had already come to the conclusion you did about how best to do this. Here's a WIP patch. At least it's fairly localized, but as expected it's rather more than 2 lines :-). Sample output: regression=# \pset format unaligned Output format is unaligned. regression=# select pg_get_viewdef('shoelace',true); pg_get_viewdef SELECT s.sl_name, s.sl_avail, s.sl_color, s.sl_len, s.sl_unit, s.sl_len * u.un_fact AS sl_len_cm FROM shoelace_data s, unit u WHERE s.sl_unit = u.un_name; (1 row) regression=# I just had an idea. We could invent a flavor of pg_get_viewdef() that took an int as the second param, that would be the wrap width. For the boolean case as above, 80 would be implied. 0 would mean always wrap. psql could be made to default to the window width, or maybe window width - 1, but we could provide a psql setting that would override it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] reprise: pretty print viewdefs
On 12/22/2011 06:11 PM, Andrew Dunstan wrote: On 12/22/2011 02:17 PM, Andrew Dunstan wrote: On 12/22/2011 01:05 PM, Tom Lane wrote: Maybe, though I fear it might complicate the ruleutils code a bit. You'd probably have to build the output for a column first and then see how long it is before deciding whether to insert a newline. In short, I don't mind trying to make this work better, but I think it will take more work than a two-line patch. OK. Let me whip something up. I had already come to the conclusion you did about how best to do this. Here's a WIP patch. At least it's fairly localized, but as expected it's rather more than 2 lines :-). Sample output: regression=# \pset format unaligned Output format is unaligned. regression=# select pg_get_viewdef('shoelace',true); pg_get_viewdef SELECT s.sl_name, s.sl_avail, s.sl_color, s.sl_len, s.sl_unit, s.sl_len * u.un_fact AS sl_len_cm FROM shoelace_data s, unit u WHERE s.sl_unit = u.un_name; (1 row) regression=# I just had an idea. We could invent a flavor of pg_get_viewdef() that took an int as the second param, that would be the wrap width. For the boolean case as above, 80 would be implied. 0 would mean always wrap. psql could be made to default to the window width, or maybe window width - 1, but we could provide a psql setting that would override it. This time with patch. cheers andrew *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *** *** 3013,3018 get_target_list(List *targetList, deparse_context *context, --- 3013,3019 char *sep; int colno; ListCell *l; + boollast_was_multiline = false; sep = ; colno = 0; *** *** 3021,3028 get_target_list(List *targetList, deparse_context *context, TargetEntry *tle = (TargetEntry *) lfirst(l); char *colname; char *attname; ! if (tle-resjunk) continue; /* ignore junk entries */ appendStringInfoString(buf, sep); --- 3022,3033 TargetEntry *tle = (TargetEntry *) lfirst(l); char *colname; char *attname; + StringInfoData targetbuf; + int leading_nl_pos = -1; + char *trailing_nl; + int i; ! if (tle-resjunk) continue; /* ignore junk entries */ appendStringInfoString(buf, sep); *** *** 3030,3035 get_target_list(List *targetList, deparse_context *context, --- 3035,3049 colno++; /* + * Put the new field spec into targetbuf so we can + * decide after we've got it whether or not it needs + * to go on a new line. + */ + + initStringInfo(targetbuf); + context-buf = targetbuf; + + /* * We special-case Var nodes rather than using get_rule_expr. This is * needed because get_rule_expr will display a whole-row Var as * foo.*, which is the preferred notation in most contexts, but at *** *** 3063,3070 get_target_list(List *targetList, deparse_context *context, if (colname) /* resname could be NULL */ { if (attname == NULL || strcmp(attname, colname) != 0) ! appendStringInfo(buf, AS %s, quote_identifier(colname)); } } } --- 3077,3139 if (colname) /* resname could be NULL */ { if (attname == NULL || strcmp(attname, colname) != 0) ! appendStringInfo(targetbuf, AS %s, quote_identifier(colname)); ! } ! ! /* Does the new field start with whitespace plus a new line? */ ! ! for (i=0; i targetbuf.len; i++) ! { ! if (targetbuf.data[i] == '\n') ! { ! leading_nl_pos = i; ! break; ! } ! if (targetbuf.data[i] ' ') ! break; ! } ! ! context-buf = buf; ! ! /* Locate the start of the current line in the buffer */ ! ! trailing_nl = (strrchr(buf-data,'\n')); ! if (trailing_nl == NULL) ! trailing_nl = buf-data; ! else ! trailing_nl++; ! ! /* ! * If the field we're adding already has a newline ! * don't add one. Otherwise, add one, plus some ! * indentation. if either the new field would ! * cause an 80 column overflow or the last field ! * had a multiline spec. ! */ ! ! if (leading_nl_pos == -1 ! ((strlen(trailing_nl) + strlen(targetbuf.data) 79) || ! last_was_multiline)) ! { ! appendContextKeyword(context, , -PRETTYINDENT_STD, ! PRETTYINDENT_STD, PRETTYINDENT_VAR); } + + /* Add the new field */ + + appendStringInfoString(buf, targetbuf.data); + + + /* Keep track of this fieLd's status for next interation */ + + if (strchr(targetbuf.data + leading_nl_pos + 1,'\n') != NULL) + last_was_multiline = true; + else + last_was_multiline = false; + + /* cleanup */ + + pfree (targetbuf.data); } } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Thu, Dec 22, 2011 at 3:19 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorber p...@omniti.com wrote: On Thu, Dec 22, 2011 at 1:33 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm wondering if we oughta just return NULL and be done with it. +1. There are multiple precedents for that sort of response, which we introduced exactly so that SELECT some_function(oid) FROM some_catalog wouldn't fail just because one of the rows had gotten deleted by the time the scan got to it. I don't think it's necessary for the relation-size functions to be any smarter. Indeed, I'd assumed that's all that Phil's patch did, since I'd not looked closer till just now. Here it is without the checking for recently dead. If it can't open the relation it simply returns NULL. I think we probably ought to make pg_database_size() and pg_tablespace_size() behave similarly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company Changes added. diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 2ee5966..8c30dc4 *** a/src/backend/utils/adt/dbsize.c --- b/src/backend/utils/adt/dbsize.c *** calculate_database_size(Oid dbOid) *** 120,131 FreeDir(dirdesc); - /* Complain if we found no trace of the DB at all */ - if (!totalsize) - ereport(ERROR, - (ERRCODE_UNDEFINED_DATABASE, - errmsg(database with OID %u does not exist, dbOid))); - return totalsize; } --- 120,125 *** Datum *** 133,140 pg_database_size_oid(PG_FUNCTION_ARGS) { Oid dbOid = PG_GETARG_OID(0); ! PG_RETURN_INT64(calculate_database_size(dbOid)); } Datum --- 127,140 pg_database_size_oid(PG_FUNCTION_ARGS) { Oid dbOid = PG_GETARG_OID(0); + int64 size; ! size = calculate_database_size(dbOid); ! ! if (!size) ! PG_RETURN_NULL(); ! ! PG_RETURN_INT64(size); } Datum *** pg_database_size_name(PG_FUNCTION_ARGS) *** 142,149 { Name dbName = PG_GETARG_NAME(0); Oid dbOid = get_database_oid(NameStr(*dbName), false); ! PG_RETURN_INT64(calculate_database_size(dbOid)); } --- 142,155 { Name dbName = PG_GETARG_NAME(0); Oid dbOid = get_database_oid(NameStr(*dbName), false); + int64 size; ! size = calculate_database_size(dbOid); ! ! if (!size) ! PG_RETURN_NULL(); ! ! PG_RETURN_INT64(size); } *** calculate_tablespace_size(Oid tblspcOid) *** 184,193 dirdesc = AllocateDir(tblspcPath); if (!dirdesc) ! ereport(ERROR, ! (errcode_for_file_access(), ! errmsg(could not open tablespace directory \%s\: %m, ! tblspcPath))); while ((direntry = ReadDir(dirdesc, tblspcPath)) != NULL) { --- 190,196 dirdesc = AllocateDir(tblspcPath); if (!dirdesc) ! return -1; while ((direntry = ReadDir(dirdesc, tblspcPath)) != NULL) { *** Datum *** 226,233 pg_tablespace_size_oid(PG_FUNCTION_ARGS) { Oid tblspcOid = PG_GETARG_OID(0); ! PG_RETURN_INT64(calculate_tablespace_size(tblspcOid)); } Datum --- 229,242 pg_tablespace_size_oid(PG_FUNCTION_ARGS) { Oid tblspcOid = PG_GETARG_OID(0); + int64 size; ! size = calculate_tablespace_size(tblspcOid); ! ! if (size 0) ! PG_RETURN_NULL(); ! ! PG_RETURN_INT64(size); } Datum *** pg_tablespace_size_name(PG_FUNCTION_ARGS *** 235,242 { Name tblspcName = PG_GETARG_NAME(0); Oid tblspcOid = get_tablespace_oid(NameStr(*tblspcName), false); ! PG_RETURN_INT64(calculate_tablespace_size(tblspcOid)); } --- 244,257 { Name tblspcName = PG_GETARG_NAME(0); Oid tblspcOid = get_tablespace_oid(NameStr(*tblspcName), false); + int64 size; ! size = calculate_tablespace_size(tblspcOid); ! ! if (size 0) ! PG_RETURN_NULL(); ! ! PG_RETURN_INT64(size); } *** pg_relation_size(PG_FUNCTION_ARGS) *** 289,295 Relation rel; int64 size; ! rel = relation_open(relOid, AccessShareLock); size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); --- 304,313 Relation rel; int64 size; ! rel = try_relation_open(relOid, AccessShareLock); ! ! if (rel == NULL) ! PG_RETURN_NULL(); size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); *** calculate_toast_table_size(Oid toastreli *** 339,352 * those won't have attached toast tables, but they can have multiple forks. */ static int64 ! calculate_table_size(Oid relOid) { int64 size = 0; - Relation rel; ForkNumber forkNum; - rel = relation_open(relOid, AccessShareLock); - /* * heap size, including FSM and VM */ --- 357,367 * those won't have attached toast
[HACKERS] WIP: explain analyze with 'rows' but not timing
Hi all, most of the time I use auto_explain, all I need is duration of the query and the plan with estimates and actual row counts. And it would be handy to be able to catch long running queries with estimates that are significantly off (say 100x lower or higher compared to actual row numbers). The gettimeofday() calls are not exactly cheap in some cases, so why to pay that price when all you need is the number of rows? The patch attached does this: 1) adds INSTRUMENT_ROWS, a new InstrumentOption - counts rows without timing (no gettimeofday() callse) - if you want timing info, use INSTRUMENT_TIMER 2) adds new option TIMING to EXPLAIN, i.e. EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... 3) adds auto_explain.log_rows_only (false by default) - if you set this to 'true', then the instrumentation will just count rows, without calling gettimeofday() It works quite well, except one tiny issue - when the log_rows_only is set to false (so that auto_explain requires timing), it silently overrides the EXPLAIN option. So that even when the user explicitly disables timing (TIMING OFF), it's overwritten and the explain collects the timing data. I could probably hide the timing info, but that'd make the issue even worse (the user would not notice that the timing was actually enabled). Maybe the right thing would be to explicitly disable timing for queries executed with EXPLAIN (TIMING OFF). Any other ideas how to make this work reasonably? The patch does not implement any checks (how far is the estimate from the reality) yet, that'll be round two. regards Tomas diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c index b320698..34015b8 100644 --- a/contrib/auto_explain/auto_explain.c +++ b/contrib/auto_explain/auto_explain.c @@ -23,6 +23,7 @@ static intauto_explain_log_min_duration = -1; /* msec or -1 */ static bool auto_explain_log_analyze = false; static bool auto_explain_log_verbose = false; static bool auto_explain_log_buffers = false; +static bool auto_explain_log_rows_only = false; static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT; static bool auto_explain_log_nested_statements = false; @@ -133,6 +134,17 @@ _PG_init(void) NULL, NULL); + DefineCustomBoolVariable(auto_explain.log_rows_only, +Do not collect timing data, just row number., +NULL, + auto_explain_log_rows_only, +false, +PGC_SUSET, +0, +NULL, +NULL, +NULL); + EmitWarningsOnPlaceholders(auto_explain); /* Install hooks. */ @@ -170,7 +182,11 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags) /* Enable per-node instrumentation iff log_analyze is required. */ if (auto_explain_log_analyze (eflags EXEC_FLAG_EXPLAIN_ONLY) == 0) { - queryDesc-instrument_options |= INSTRUMENT_TIMER; + if (auto_explain_log_rows_only) + queryDesc-instrument_options |= INSTRUMENT_ROWS; + else + queryDesc-instrument_options |= (INSTRUMENT_TIMER | INSTRUMENT_ROWS); + if (auto_explain_log_buffers) queryDesc-instrument_options |= INSTRUMENT_BUFFERS; } diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index e38de5c..4e3f343 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -133,6 +133,8 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString, es.costs = defGetBoolean(opt); else if (strcmp(opt-defname, buffers) == 0) es.buffers = defGetBoolean(opt); + else if (strcmp(opt-defname, timing) == 0) + es.timing = defGetBoolean(opt); else if (strcmp(opt-defname, format) == 0) { char *p = defGetString(opt); @@ -229,6 +231,7 @@ ExplainInitState(ExplainState *es) /* Set default options. */ memset(es, 0, sizeof(ExplainState)); es-costs = true; + es-timing = true; /* Prepare output buffer. */ es-str = makeStringInfo(); } @@ -355,8 +358,11 @@ ExplainOnePlan(PlannedStmt *plannedstmt, ExplainState *es, int eflags; int
Re: [HACKERS] Review: Non-inheritable check constraints
Hi, There is at least one other problem. Consider: rhaas=# create table a (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# create table b (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# alter table b inherit a; ALTER TABLE This needs to fail if chk is an only constraint, but it doesn't, even with this patch. As you rightly point out, this is because we cannot specify ONLY constraints inside a CREATE TABLE as of now. I think that part of the problem here is fuzzy thinking about the meaning of the word ONLY in ALTER TABLE ONLY b. The word ONLY there is really supposed to mean that the operation is performed on b but not on its descendents; but that's not what you're doing: you're really performing a different operation. In theory, for a table with no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to be identical, but here they are not. I think that's probably bad. ONLY has inheritance based connotations for present/future children. ALTER ONLY combined with ADD is honored better now with this patch IMO (atleast for constraints I think). Another manifestation of this problem is that there's no way to add an ONLY constraint in a CREATE TABLE command. I think that's bad, too: it should be possible to declare any constraint at table creation time, and if the ONLY were part of the constraint definition rather than part of the target-table specification, that would work fine. As it is, it doesn't. Well, the above was thought about during the original discussion and eventually we felt that CREATE TABLE already has other issues as well, so not having this done as part of creating a table was considered acceptable: http://postgresql.1045698.n5.nabble.com/Check-constraints-on-partition-parents-only-tt464.html#a4647144 I am tempted to say we should revert this and rethink. I don't believe we are only a small patch away from finding all the bugs here. Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT type of syntax, then +1 for reverting this and a subsequent revised submission. Regards, Nikhils
Re: [HACKERS] Review: Non-inheritable check constraints
And yeah, certainly there's a bug as you point out. postgres=# create table a1 (ff1 int, constraint chk check (ff1 0)); postgres=# create table b1 (ff1 int); postgres=# alter table only b1 add constraint chk check (ff1 0); postgres=# alter table b1 inherit a1; The last command should have refused to inherit. Regards, Nikhils On Fri, Dec 23, 2011 at 8:55 AM, Nikhil Sontakke nikkh...@gmail.com wrote: Hi, There is at least one other problem. Consider: rhaas=# create table a (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# create table b (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# alter table b inherit a; ALTER TABLE This needs to fail if chk is an only constraint, but it doesn't, even with this patch. As you rightly point out, this is because we cannot specify ONLY constraints inside a CREATE TABLE as of now. I think that part of the problem here is fuzzy thinking about the meaning of the word ONLY in ALTER TABLE ONLY b. The word ONLY there is really supposed to mean that the operation is performed on b but not on its descendents; but that's not what you're doing: you're really performing a different operation. In theory, for a table with no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to be identical, but here they are not. I think that's probably bad. ONLY has inheritance based connotations for present/future children. ALTER ONLY combined with ADD is honored better now with this patch IMO (atleast for constraints I think). Another manifestation of this problem is that there's no way to add an ONLY constraint in a CREATE TABLE command. I think that's bad, too: it should be possible to declare any constraint at table creation time, and if the ONLY were part of the constraint definition rather than part of the target-table specification, that would work fine. As it is, it doesn't. Well, the above was thought about during the original discussion and eventually we felt that CREATE TABLE already has other issues as well, so not having this done as part of creating a table was considered acceptable: http://postgresql.1045698.n5.nabble.com/Check-constraints-on-partition-parents-only-tt464.html#a4647144 I am tempted to say we should revert this and rethink. I don't believe we are only a small patch away from finding all the bugs here. Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT type of syntax, then +1 for reverting this and a subsequent revised submission. Regards, Nikhils
Re: [HACKERS] Review: Non-inheritable check constraints
Excerpts from Nikhil Sontakke's message of vie dic 23 00:25:26 -0300 2011: Hi, There is at least one other problem. Consider: rhaas=# create table a (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# create table b (ff1 int, constraint chk check (ff1 0)); CREATE TABLE rhaas=# alter table b inherit a; ALTER TABLE This needs to fail if chk is an only constraint, but it doesn't, even with this patch. As you rightly point out, this is because we cannot specify ONLY constraints inside a CREATE TABLE as of now. No, it's not related to that. You could do it even without that, by creating a table then adding an ONLY constraint and only later doing the ALTER TABLE / INHERIT. The problem we need to fix here is that this command needs to check that there are no unmergeable constraints; and if there are any, error out. I think that part of the problem here is fuzzy thinking about the meaning of the word ONLY in ALTER TABLE ONLY b. The word ONLY there is really supposed to mean that the operation is performed on b but not on its descendents; but that's not what you're doing: you're really performing a different operation. In theory, for a table with no current descendents, ALTER TABLE ONLY b and ALTER TABLE b ought to be identical, but here they are not. I think that's probably bad. ONLY has inheritance based connotations for present/future children. ALTER ONLY combined with ADD is honored better now with this patch IMO (atleast for constraints I think). I agree with Robert that this usage of ALTER TABLE ONLY is slightly different from other usages of the same command, but I disagree that this means that we need another command to do what we want to do here. IOW, I prefer to keep the syntax we have. I am tempted to say we should revert this and rethink. I don't believe we are only a small patch away from finding all the bugs here. Sure, if we all think that CREATE TABLE should support ONLY CONSTRAINT type of syntax, then +1 for reverting this and a subsequent revised submission. I don't think this is a given ... In fact, IMO if we're only two or three fixes away from having it all nice and consistent, I think reverting is not necessary. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
I don't think this is a given ... In fact, IMO if we're only two or three fixes away from having it all nice and consistent, I think reverting is not necessary. FWIW, here's a quick fix for the issue that Robert pointed out. Again it's a variation of the first issue that he reported. We have two functions which try to merge existing constraints: MergeWithExistingConstraint() in heap.c MergeConstraintsIntoExisting() in tablecmds.c Nice names indeed :) I have also tried to change the error message as per Alvaro's suggestions. I will really try to see if we have other issues. Really cannot have Robert reporting all the bugs and getting annoyed, but there are lotsa variations possible with inheritance.. Regards, Nikhils only_constraint_no_merge_v2.0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: explain analyze with 'rows' but not timing
Hello 2011/12/23 Tomas Vondra t...@fuzzy.cz: Hi all, most of the time I use auto_explain, all I need is duration of the query and the plan with estimates and actual row counts. And it would be handy to be able to catch long running queries with estimates that are significantly off (say 100x lower or higher compared to actual row numbers). The gettimeofday() calls are not exactly cheap in some cases, so why to pay that price when all you need is the number of rows? The patch attached does this: 1) adds INSTRUMENT_ROWS, a new InstrumentOption - counts rows without timing (no gettimeofday() callse) - if you want timing info, use INSTRUMENT_TIMER 2) adds new option TIMING to EXPLAIN, i.e. EXPLAIN (ANALYZE ON, TIMING ON) SELECT ... 3) adds auto_explain.log_rows_only (false by default) - if you set this to 'true', then the instrumentation will just count rows, without calling gettimeofday() It works quite well, except one tiny issue - when the log_rows_only is set to false (so that auto_explain requires timing), it silently overrides the EXPLAIN option. So that even when the user explicitly disables timing (TIMING OFF), it's overwritten and the explain collects the timing data. I could probably hide the timing info, but that'd make the issue even worse (the user would not notice that the timing was actually enabled). Maybe the right thing would be to explicitly disable timing for queries executed with EXPLAIN (TIMING OFF). Any other ideas how to make this work reasonably? The patch does not implement any checks (how far is the estimate from the reality) yet, that'll be round two. It is interesting idea - but maybe we can have a have a different metric than time - this is very unstable quantity - mainly on production overloaded servers. It is good idea - we need a tool for bad statistic searching that is relative cheap. Regards Pavel regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allow substitute allocators for PGresult.
Hello, thank you for taking the time for comment. At Wed, 21 Dec 2011 11:09:59 -0500, Robert Haas robertmh...@gmail.com wrote I find the names of the functions added here to be quite confusing and would suggest renaming them. I expected PQgetAsCstring to do something similar to PQgetvalue, but the code is completely different, To be honest, I've also felt that kind of perplexity. If the problem is simply of the naming, I can propose the another name PQreadAttValue... This is not so good too... But... and even after reading the documentation I still don't understand what that function is supposed to be used for. Why as cstring? What would the other option be? Is it a problem of the poor description? Or about the raison d'être of the function? The immediate cause of the existence of the function is that getAnotherTuple internally stores the field values of the tuples sent from the server, in the form of PGresAttValue, and I have found only one route to store a tuple into TupleStore is BuildeTupleFromCStrings() to tupelstore_puttuple() which is dblink does in materializeResult(), and of cource C-string is the most natural format in C program, and I have hesitated to modify execTuples.c, and I wanted to hide the details of PGresAttValue. Assuming that the values are passed as PGresAttValue* is given (for the reasons of performance and the extent of the modification), the adding tuple functions should get the value from the struct. This can be done in two ways from the view of authority (`encapsulation', in other words) and convenience, one is with the knowledge of the structure, and the other is without it. PQgetAsCstring is the latter approach. (And it is inconsistent with the fact that the definition of PGresAttValue is moved into lipq-fe.h from libpq-int.h. The details of the structure should be hidden like PGresult in this approach). But it is not obvious that the choice is better than the another one. If we consider that PGresAttValue is too simple and stable to hide the details, PQgetAsCString will be taken off and the problem will go out. PGresAttValue needs documentation in this case. I prefer to handle PGresAttValue directly if no problem. I also don't think the add tuple terminology is particularly good. It's not obvious from the name that what you're doing is overriding the way memory is allocated and results are stored. This phrase is taken from pqAddTuple() in fe-exec.c at first and have not been changed after the function is integrated with other functions. I propose tuple storage handler for the alternative. - typedef void *(*addTupleFunction)(...); + typedef void *(*tupleStorageHandler)(...); - typedef enum { ADDTUP_*, } AddTupFunc; + typedef enum { TSHANDLER_*, } TSHandlerCommand; - void *PQgetAddTupleParam(...); + void *PQgetTupleStrageHandlerContext(...); - void PQregisterTupleAdder(...); + void PQregisterTupleStoreHandler(...); - addTupleFunction PGresult.addTupleFunc; + tupleStorageHandler PGresult.tupleStorageHandlerFunc; - void *PGresult.addTuleFuncParam; + void *PGresult.tupleStorageHandlerContext; - char *PGresult.addTuleFuncErrMes; + void *PGresult.tupelStrageHandlerErrMes; Also, what about the problem Tom mentioned here? http://archives.postgresql.org/message-id/1042.1321123...@sss.pgh.pa.us The plan that simply replace malloc's with something like palloc's is abandoned for the narrow scope. dblink-plus copies whole PGresult into TupleStore in order to avoid making orphaned memory on SIGINT. The resource owner mechanism is principally applicable to that but practically hard for the reason that current implementation without radically modification couldn't accept it.. In addition to that, dblink also does same thing for maybe the same reason with dblink-plus and another reason as far as I heard. Whatever the reason is, both dblink and dblink-plus do the same thing that could lower the performance than expected. If TupleStore(TupleDesc) is preferable to PGresult for in-backend use and oridinary(client-use) libpq users can handle only PGresult, the mechanism like this patch would be reuired to maintain the compatibility, I think. To the contrary, if there is no significant reason to use TupleStore in backend use - it implies that existing mechanisms like resource owner can save the backend inexpensively from possible inconvenience caused by using PGresult storage in backends - PGresult should be used as it is. I think TupleStore prepared to be used in backend is preferable for the usage and don't want to get data making detour via PGresult. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers