Re: [HACKERS] patch for implementing SPI_gettypemod()
On Wed, Feb 8, 2012 at 12:19 PM, Chetan Suttraway < chetan.suttra...@enterprisedb.com> wrote: > > > On Wed, Feb 8, 2012 at 8:15 AM, Robert Haas wrote: > >> On Tue, Feb 7, 2012 at 4:25 PM, Peter Eisentraut wrote: >> > On ons, 2012-02-01 at 15:53 +0530, Chetan Suttraway wrote: >> >> This is regarding the TODO item : >> >> "Add SPI_gettypmod() to return a field's typemod from a TupleDesc" >> > >> > My first thought was, this should be spelled SPI_gettypmod(). Not sure >> > what others think. >> >> +1. >> >> > The reason for using SPI_gettypemod() name was that I did see > SPI_gettypeid(). > > Anyways, will update patch with recommended name. > > > Regards, > Chetan > > > Please refer the attached patch which now uses SPI_gettypmod() name. Regards, Chetan -- EnterpriseDB Corporation The Enterprise PostgreSQL Company Website: www.enterprisedb.com EnterpriseDB Blog : http://blogs.enterprisedb.com Follow us on Twitter : http://www.twitter.com/enterprisedb diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 81f284c..1f4632e 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -955,6 +955,24 @@ SPI_gettypeid(TupleDesc tupdesc, int fnumber) return (SystemAttributeDefinition(fnumber, true))->atttypid; } +int4 +SPI_gettypmod(TupleDesc tupdesc, int fnumber) +{ + SPI_result = 0; + + if (fnumber > tupdesc->natts || fnumber == 0 || + fnumber <= FirstLowInvalidHeapAttributeNumber) + { + SPI_result = SPI_ERROR_NOATTRIBUTE; + return -1; + } + + if (fnumber > 0) + return tupdesc->attrs[fnumber - 1]->atttypmod; + else + return (SystemAttributeDefinition(fnumber, true))->atttypmod; +} + char * SPI_getrelname(Relation rel) { diff --git a/src/include/executor/spi.h b/src/include/executor/spi.h index cfbaa14..dedb1c7 100644 --- a/src/include/executor/spi.h +++ b/src/include/executor/spi.h @@ -113,6 +113,7 @@ extern char *SPI_getvalue(HeapTuple tuple, TupleDesc tupdesc, int fnumber); extern Datum SPI_getbinval(HeapTuple tuple, TupleDesc tupdesc, int fnumber, bool *isnull); extern char *SPI_gettype(TupleDesc tupdesc, int fnumber); extern Oid SPI_gettypeid(TupleDesc tupdesc, int fnumber); +extern int4 SPI_gettypmod(TupleDesc tupdesc, int fnumber); extern char *SPI_getrelname(Relation rel); extern char *SPI_getnspname(Relation rel); extern void *SPI_palloc(Size size); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for implementing SPI_gettypemod()
On Wed, Feb 8, 2012 at 8:15 AM, Robert Haas wrote: > On Tue, Feb 7, 2012 at 4:25 PM, Peter Eisentraut wrote: > > On ons, 2012-02-01 at 15:53 +0530, Chetan Suttraway wrote: > >> This is regarding the TODO item : > >> "Add SPI_gettypmod() to return a field's typemod from a TupleDesc" > > > > My first thought was, this should be spelled SPI_gettypmod(). Not sure > > what others think. > > +1. > > The reason for using SPI_gettypemod() name was that I did see SPI_gettypeid(). Anyways, will update patch with recommended name. Regards, Chetan -- EnterpriseDB Corporation The Enterprise PostgreSQL Company Website: www.enterprisedb.com EnterpriseDB Blog : http://blogs.enterprisedb.com Follow us on Twitter : http://www.twitter.com/enterprisedb
Re: [HACKERS] Can PQstatus() be used by Application to check connection to postgres periodically?
Hello, sujayr06. You wrote: s> Hello All, s>My application has to do a real time data upload to PostgreSQL s> server. s>Every time i have to do a real time upload, i do not wish to open s> new connection. s>I want to open a connection once [when my application comes up] s> and periodically check if the connection is active. s>Can PQstatus() be used by application to check the status of the s> connection already established? s>If PQstatus() cannot be used, does PostgreSQL provide alternate s> interface to check the status of the connection. s> Note : I do not wish to open connection on every real time upload s> as its an overhead for the application. s>Appreciate any help! You may use PQtransactionStatus for this purpose (http://www.postgresql.org/docs/9.1/static/libpq-status.html) s> WBR, s> Sujay s> s> s> -- s> View this message in context: s> http://postgresql.1045698.n5.nabble.com/Can-PQstatus-be-used-by-Application-to-check-connection-to-postgres-periodically-tp5462315p5462315.html s> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for preventing the specification of conflicting transaction read/write options
On Tue, Feb 7, 2012 at 8:44 PM, Kevin Grittner wrote: > Chetan Suttraway wrote: > > > This is regarding the TODO item: > > "Prevent the specification of conflicting transaction read/write > > options" > > > > listed at: > > http://wiki.postgresql.org/wiki/Todo > > Thanks for chipping away at items on the list. > > > Please pass on any inputs on the patch. > > Right now we want people focusing on fixing bugs and reviewing > patches submitted by the start of the current CommitFest. Please > add this to the open CF so we don't lose track of it. > > -Kevin > Sure. Thanks! -Chetan -- EnterpriseDB Corporation The Enterprise PostgreSQL Company Website: www.enterprisedb.com EnterpriseDB Blog : http://blogs.enterprisedb.com Follow us on Twitter : http://www.twitter.com/enterprisedb
Re: [HACKERS] pgstat documentation tables
On Feb 8, 2012 5:32 AM, "Tom Lane" wrote: > > Bruce Momjian writes: > > How do people feel about pulling text out of the SGML docs and loading > > it into the database as table and column comments? > > I'm not thrilled by that proposal. The length limits on comments are > very much shorter than what is sensible to use in catalogs.sgml (at > least if you want the comments to look passable in \d displays). And > I don't want people trying to serve two different use-cases when they > write those. Yes, I'd rather see longer than shorter descriptions in the docs, and this would turn that backwards... > > Perhaps it'd work to pull column comments from the C header files > --- any same-line comment in catalog/pg_*.h is probably short enough to > serve this purpose usefully. That could work a lot better.. /Magnus
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
(2012/02/07 23:44), Marko Kreen wrote: > On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote: >> - What is the right (or recommended) way to prevent from throwing >> exceptoin in row-processor callback function? When author should use >> PG_TRY block to catch exception in the callback function? > > When it calls backend functions that can throw exceptions? > As all handlers running in backend will want to convert data > to Datums, that means "always wrap handler code in PG_TRY"? ISTM that telling a) what happens to PGresult and PGconn when row processor ends without return, and b) how to recover them would be necessary. We can't assume much about caller because libpq is a client library. IMHO, it's most important to tell authors of row processor clearly what should be done on error case. In such case, must we call PQfinish, or is calling PQgetResult until it returns NULL enough to reuse the connection? IIUC calling pqClearAsyncResult seems sufficient, but it's not exported for client... Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat documentation tables
Bruce Momjian writes: > How do people feel about pulling text out of the SGML docs and loading > it into the database as table and column comments? I'm not thrilled by that proposal. The length limits on comments are very much shorter than what is sensible to use in catalogs.sgml (at least if you want the comments to look passable in \d displays). And I don't want people trying to serve two different use-cases when they write those. Perhaps it'd work to pull column comments from the C header files --- any same-line comment in catalog/pg_*.h is probably short enough to serve this purpose usefully. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Text-any concatenation volatility acting as optimization barrier
Marti Raudsepp writes: > Case #1 uses the normal textcat(text, text) operator by automatically > coercing 'x' as text. > However, case #2 uses the anytextcat(anynonarray, text), which is > marked as volatile thus acts as an optimization barrier. Hmm ... since those operators were invented (in 8.3), we have adopted a policy that I/O functions are presumed to be no worse than stable: http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php ISTM that would justify relabeling anytextcat/textanycat as stable, which should fix this. > One way would be doing preprocess_expression() before > pull_up_subqueries() so function inlining happens earlier, but I can't > imagine what unintended consequences that might have. I'm pretty sure that would be a bad idea. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat documentation tables
On Mon, Jan 16, 2012 at 01:01:50PM -0300, Alvaro Herrera wrote: > > > Bruce had a patch to turn SGML descriptions of system view into comments > via some Perl program or something. He posted it many moons ago and I > haven't seen an updated version. Bruce, do you have something to say on > this topic? Yes, I was waiting to get feedback on that but it seems the 9.2 release slipped right by me on that. I guess I will try to pick it up for 9.3. How do people feel about pulling text out of the SGML docs and loading it into the database as table and column comments? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On Thu, Jan 19, 2012 at 09:42:52PM -0500, Robert Haas wrote: > I certainly didn't intend to come across as disparaging your work on > this topic. I understand that there are big problems with the way > things work now; I'm just cautious about trying to replace them too > hastily with something that may not turn out to be any better. Of > course, if we can replace it with something that we're sure is > actually an improvement, I'm all in favor of that. But, IMHO, the > problems in this area are too serious to be solved by renaming the > knobs. At most, we're going to buy ourselves a little time to come up > with a better solution. > > IMHO, and at the risk of repeating myself, one of the big problems in > this area is that we're making the user guess something that we really > ought to be figuring out for them. Just as users want checkpoints to > run as slowly as possible while still not bumping into the next > checkpoint, they'd presumably like vacuum to run as slowly as possible > without bumping into the next vacuum. Instead, we make them tell us > how fast they'd like it tor run, which requires them to guess a value > high enough to finish soon enough but low enough to minimize the > impact on the rest of the system. > > Another problem is that the vacuum algorithm itself could, I think, be > made much smarter. We could teach HOT to prune pages that contain no > HOT chains but do contain dead tuples. That would leave dead line > pointers behind, but that's not nearly as bad as leaving the entire > tuple behind. We could, as Simon and others have suggested, have one > threshold for vacuuming the heap (i.e. reclaiming dead tuples) and > another for vacuuming the indexes (i.e. reclaiming dead line > pointers). That would open the door to partial vacuuming: just vacuum > half a gigabyte or so of the heap, and then move on; the next vacuum > can pick up where that one left off, at least up to the point where we > decide we need to make an index pass; it would possibly also allow us > to permit more than one vacuum on the same table at the same time, > which is probably needed for very large tables. We could have > backends that see dead tuples on a page throw them over to the fence > to the background writer for immediate pruning. I blather, but I > guess my point is that I really hope we're going to do something > deeper here at some point in the near future, whatever becomes of the > proposals now on the table. As much as I hate to poo-poo a patch addition, I have to agree with Robert Haas on this one. Renaming settings really isn't moving us forward. It introduces a migration problem and really doesn't move us forward in solving the underlying problem. Additional monitoring, while helpful, also is only a stop-gap. Only a small number of sites are going to monitor auto-vacuum/analyze. Let's not start writing Postgres for those super-busy sites with a team of administrators --- while they are important, they are not the majority of our user-base, and we can pride ourselves that Postgres runs pretty well without a team of admins. We don't want to get into a case where our super-visible, high-volume folks are overly setting the project direction. If we look at checkpoint smoothing, that was solved the right way with a setting that worked automatically for everyone. Now, I don't know if the solution is to time read/write duration to see how busy the system is, or to look at the statistics to see how backlogged the autovacuum system is when it gets time to actually process a table, but those are the questions we should be asking here. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On Thu, Jan 19, 2012 at 05:39:41PM -0500, Greg Smith wrote: > On 1/19/12 1:10 PM, Robert Haas wrote: > >I have to say that I find that intensely counterintuitive. The > >current settings are not entirely easy to tune correctly, but at least > >they're easy to explain. > > I attempt to explain those settings to people in training classes > about once a month. It's never been anything but a complete > disaster. I tell students you will not have problems with these settings unless you change them. ;-) -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Tue, Feb 07, 2012 at 08:58:59PM +, Simon Riggs wrote: > On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch wrote: > > On Wed, Jan 11, 2012 at 10:12:31PM +, Simon Riggs wrote: > > This patch uses FPIs to guard against torn hint writes, even when the > > checksums are disabled. ?One could not simply condition them on the > > page_checksums setting, though. ?Suppose page_checksums=off and we're > > hinting > > a page previously written with page_checksums=on. ?If that write tears, > > leaving the checksum intact, that block will now fail verification. ?A > > couple > > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only if > > the > > old page had a checksum. ?(b) Add a checksumEnableLSN field to pg_control. > > Whenever the cluster starts with checksums disabled, set the field to > > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled and > > the > > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a checksum > > failure occurs in a page with LSN older than the stored one, emit either a > > softer warning or no message at all. > > We can only change page_checksums at restart (now) so the above seems moot. > > If we crash with FPWs enabled we repair any torn pages. There's no live bug, but that comes at a high cost: the patch has us emit full-page images for hint bit writes regardless of the page_checksums setting. > > PageSetLSN() is not atomic, so the shared buffer content lock we'll be > > holding > > is insufficient. > > Am serialising this by only writing PageLSN while holding buf hdr lock. That means also taking the buffer header spinlock in every PageGetLSN() caller holding only a shared buffer content lock. Do you think that will pay off, versus the settled pattern of trading here your shared buffer content lock for an exclusive one? > > I can see value in an option to exclude local buffers, since corruption > > there > > may be less exciting. ?It doesn't seem important for an initial patch, > > though. > > I'm continuing to exclude local buffers. Let me know if that should change. Seems reasonable. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Tue, Feb 7, 2012 at 4:59 PM, Robert Haas wrote: > It turns out that (as you anticipated) there are some problems with my > previous proposal. I assume you're talking to Tom, as my powers of anticipation are actually quite limited... :-) > This is not > quite enough to get rid of g_conn, but it's close: the major stumbling > block at this point is probably exit_nicely(). The gyrations we're > going through to make sure that AH->connection gets closed before > exiting are fairly annoying; maybe we should invent something in > dumputils.c along the line of the backend's on_shmem_exit(). Yeah, this becomes even more important with parallel jobs where you want all worker processes die once the parent exits. Otherwise some 10 already started processes would continue to dump your 10 largest tables for the next few hours with the master process long dead... All while you're about to start up the next master process... In my patch I dealt with exactly the same problem for the error handler by creating a separate function that has a static variable (a pointer to the ParallelState). The value is set and retrieved through the same function, so yes, it's kinda global but then again it can only be accessed from this function, which is only called from the error handler. > I'm starting to think it might make sense to press on with this > refactoring just a bit further and eliminate the distinction between > Archive and ArchiveHandle. How about doing more refactoring after applying the patch, you'd then see what is really needed and then we'd also have an actual use case for more than one connection (You might have already guessed that this proposal is heavily influenced by my self-interest of avoiding too much work to make my patch match your refactoring)... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote: > So we need some principled way of deciding how much inlining is > reasonable, because I am 100% certain this is not going to be the last > time someone discovers that a massive exercise in inlining can yield a > nifty performance benefit in some case or another: index builds and > COPY have already been mentioned on this thread, and I expect that > people will find other cases as well. I'm not really sure what the > "budget" is - i.e. how much binary bloat we can afford to add - or how > many cases there are that can benefit, but the first isn't infinite > and the second is more than the first. > > Having said all that, I am inclined to commit at least some portion of > this, but I wanted to knock off a few other patches that have been > lingering for a while first. One approach would be to only do a few types now, e.g. integers and strings, and perhaps leave the others for later. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql case preserving completion
On Tue, Feb 7, 2012 at 8:02 PM, Bruce Momjian wrote: > On Wed, Feb 01, 2012 at 08:19:24PM +0200, Peter Eisentraut wrote: >> On tis, 2012-01-17 at 16:46 +0900, Fujii Masao wrote: >> > When I tested the patch, "create ta" was converted unexpectedly to >> > "create TABLE" >> > though "alter ta" was successfully converted to "alter table". As far >> > as I read the patch, >> > you seems to have forgotten to change create_or_drop_command_generator() or >> > something. >> >> Thanks, fixed and committed. > > I have to admit I like the capitalized keywords, but don't normally type > them, but it must be just me because no one else said anything. Yeah, I liked the old behavior, too. But I figured it was a sign that I'm an old fuddy-duddy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for implementing SPI_gettypemod()
On Tue, Feb 7, 2012 at 4:25 PM, Peter Eisentraut wrote: > On ons, 2012-02-01 at 15:53 +0530, Chetan Suttraway wrote: >> This is regarding the TODO item : >> "Add SPI_gettypmod() to return a field's typemod from a TupleDesc" > > My first thought was, this should be spelled SPI_gettypmod(). Not sure > what others think. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add protransform for numeric, varbit, and temporal types
On Tue, Feb 7, 2012 at 8:45 PM, Noah Misch wrote: > On Tue, Feb 07, 2012 at 12:43:11PM -0500, Robert Haas wrote: >> I've committed the numeric and varbit patches and will look at the >> temporal one next. > > Thanks. The comment you added to numeric_transform() has a few typos, > "constained" -> "constrained" and "nodes" -> "notes". Yuck, that's pretty bad. Thanks, fixed (I hope). > When we rebuild the table at this point, it has a small maximum tuple size. > Therefore, we omit the toast table entirely. Ah, OK. The debug messages might be more clear if they mentioned whether or not we were omitting the TOAST table, I suppose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On Tue, Feb 7, 2012 at 2:39 PM, Jay Levitt wrote: > Jim "Decibel!" Nasby wrote: >> >> I agree that it's probably pretty unusual to index floats. > > FWIW: Cubes and points are floats, right? So would spatial indexes benefit > from this optimization, or is it only raw floats? Cubes are not floats, nor are points. In general, what's being proposed here is to make a separate copy of quicksort for each of N datatypes, with the comparator function inlined. In order for this to benefit multiple types, they'd have to use the same set of machine instructions to compare values on disk. So in general you can make this apply to as many datatypes as you want: it's just that each one will require another copy of the quicksort code in the binary. The more complex the comparator is, the less you'll save, because the savings presumably come largely from things like saving and resoring registers and pushing/popping stack frames, and that's really only going to be material if the underlining comparator is pretty darn cheap. Actually, to be more precise, the patch is proposing to create TWO copies of the quicksort code for each of N datatypes, one for the case where there is but a single sort key and the other for the case where there are multiple sort keys. Having a separate code for the single sort key case saves more because it lets you get rid of a control loop - you don't have to iterate down the list of keys, because there's only one. I've been skeptical of this patch for a number of reasons, the major one of which is that most workloads spend only a very small amount of time in doing qucksorts, and most quicksorts are of very small amounts of data and therefore fast anyway. It is easy to construct an artificial test case that does lots and lots of in-memory sorting, but in real life I think that's not the great part of what people use databases for. The time gets spent on things like groveling through big tables or doing joins, and then maybe we sort the rows at the end, but that's not where most of the time goes. It may be rightly pointed out, of course, that a penny saved is a penny earned: the fact that most people don't spend much time quicksorting doesn't mean that we shouldn't try to make quicksorting as fast as possible. But there are a couple of additional considerations. First, the code is, at present, uglier than I would like. I mean to spend some time trying to clean that up, but there are 102 patches in this CommitFest (the number seems to keep slowly growing, despite the deadline being three weeks gone) and this isn't the only one I care about, so I haven't quite gotten there yet. But hopefully soon. Second, there's a concern about binary bloat: duplicating lots of code with different comparators inlined generates, well, a lot of machine code. Of course, an 0.8% increase in the size of the resulting binary is very unlikely to produce any measurable performance regression on its own, but that's not really the issue. The issue is rather that we could easily go nuts applying this technique in lots of different places - or even just in one place to lots of different types. Let's say that we find that even for types with very complex comparators, we can get a 5% speedup on quick-sorting speed using this technique. Should we, then, apply the technique to every type we have? Perhaps the binary would grow by, I don't know, 10%. Maybe that's still not enough to start hurting performance (or making packagers complain), but surely it's getting there, and what happens when we discover that similar speedups are possible using the same trick in five other scenarios? I think it's a bit like going out to dinner and spending $50. It's nicer than eating at home, and many people can afford to do it occasionally, but if you do it every night, it gets expensive (and possibly fattening). So we need some principled way of deciding how much inlining is reasonable, because I am 100% certain this is not going to be the last time someone discovers that a massive exercise in inlining can yield a nifty performance benefit in some case or another: index builds and COPY have already been mentioned on this thread, and I expect that people will find other cases as well. I'm not really sure what the "budget" is - i.e. how much binary bloat we can afford to add - or how many cases there are that can benefit, but the first isn't infinite and the second is more than the first. Having said all that, I am inclined to commit at least some portion of this, but I wanted to knock off a few other patches that have been lingering for a while first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add protransform for numeric, varbit, and temporal types
On Tue, Feb 07, 2012 at 12:43:11PM -0500, Robert Haas wrote: > I've committed the numeric and varbit patches and will look at the > temporal one next. Thanks. The comment you added to numeric_transform() has a few typos, "constained" -> "constrained" and "nodes" -> "notes". > I did notice one odd thing, though: sometimes we > seem to get a rebuild on the toast index for no obvious reason: > > rhaas=# set client_min_messages=debug1; > SET > rhaas=# create table foo (a serial primary key, b varbit); > NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for > serial column "foo.a" > DEBUG: building index "pg_toast_16430_index" on table "pg_toast_16430" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "foo_pkey" for table "foo" > DEBUG: building index "foo_pkey" on table "foo" > CREATE TABLE > rhaas=# alter table foo alter column b set data type varbit(4); > DEBUG: rewriting table "foo" > DEBUG: building index "foo_pkey" on table "foo" > ALTER TABLE When we rebuild the table at this point, it has a small maximum tuple size. Therefore, we omit the toast table entirely. > rhaas=# alter table foo alter column b set data type varbit; > DEBUG: building index "pg_toast_16430_index" on table "pg_toast_16430" > ALTER TABLE This command makes the tuple size unbounded again, so we create a toast table. > Strangely, it doesn't happen if I add another column to the table: > > rhaas=# set client_min_messages=debug1; > SET > rhaas=# create table foo (a serial primary key, b varbit, c varbit); With the extra unconstrained varbit column, the tuple size is continuously unbounded and the table has a toast table at all stages. So, the difference in behavior is correct, albeit unintuitive. > NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for > serial column "foo.a" > DEBUG: building index "pg_toast_16481_index" on table "pg_toast_16481" > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index > "foo_pkey" for table "foo" > DEBUG: building index "foo_pkey" on table "foo" > CREATE TABLE > rhaas=# alter table foo alter column b set data type varbit(4); > DEBUG: building index "pg_toast_16490_index" on table "pg_toast_16490" > DEBUG: rewriting table "foo" > DEBUG: building index "foo_pkey" on table "foo" > ALTER TABLE > rhaas=# alter table foo alter column b set data type varbit; > ALTER TABLE -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql case preserving completion
On Wed, Feb 01, 2012 at 08:19:24PM +0200, Peter Eisentraut wrote: > On tis, 2012-01-17 at 16:46 +0900, Fujii Masao wrote: > > When I tested the patch, "create ta" was converted unexpectedly to > > "create TABLE" > > though "alter ta" was successfully converted to "alter table". As far > > as I read the patch, > > you seems to have forgotten to change create_or_drop_command_generator() or > > something. > > Thanks, fixed and committed. I have to admit I like the capitalized keywords, but don't normally type them, but it must be just me because no one else said anything. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Tue, Feb 07, 2012 at 05:06:18PM -0500, Greg Smith wrote: > On 02/07/2012 03:23 PM, Bruce Momjian wrote: > >Where did you see that there will be an improvement in the 9.2 > >documentation? I don't see an improvement. > > I commented that I'm hoping for an improvement in the documentation > of how much timing overhead impacts attempts to measure this area > better. That's from the "add timing of buffer I/O requests" feature > submission. I'm not sure if Bene read too much into that or not; I > didn't mean to imply that the docs around random_page_cost have > gotten better. > > This particular complaint is extremely common though, seems to pop > up on one of the lists a few times each year. Your suggested doc > fix is fine as a quick one, but I think it might be worth expanding > further on this topic. Something discussing SSDs seems due here > too. Here's a first draft of a longer discussion, to be inserted > just after where it states the default value is 4.0: I was initially concerned that tuning advice in this part of the docs would look out of place, but now see the 25% shared_buffers recommentation, and it looks fine, so we are OK. (Should we caution against more than 8GB of shared buffers? I don't see that in the docs.) I agree we are overdue for better a explanation of random page cost, so I agree with your direction. I did a little word-smithing to tighten up your text; feel free to discard what you don't like: Random access to mechanical disk storage is normally much more expensive than four-times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached. If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgindent README correction
On Mon, Jan 09, 2012 at 01:32:49PM -0500, Robert Haas wrote: > > The one other issue I ran into in following the latest pgindent > > instructions was that I had to add #include to the > > parse.c file (as included in the pg_bsd_indent-1.1.tar.gz file at > > ftp://ftp.postgresql.org/pub/dev ). Without it I got this: > > > > parse.c: In function *parse*: > > parse.c:236:6: warning: implicit declaration of function *exit* > > parse.c:236:6: warning: incompatible implicit declaration of built-in > > function *exit* > > > > Can someone fix that and put up a 1.2 version? > > Sounds like a job for Mr. Momjian. What server do I log into to update the ftp pgindent tgz file? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 02/07/2012 03:23 PM, Bruce Momjian wrote: Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I commented that I'm hoping for an improvement in the documentation of how much timing overhead impacts attempts to measure this area better. That's from the "add timing of buffer I/O requests" feature submission. I'm not sure if Bene read too much into that or not; I didn't mean to imply that the docs around random_page_cost have gotten better. This particular complaint is extremely common though, seems to pop up on one of the lists a few times each year. Your suggested doc fix is fine as a quick one, but I think it might be worth expanding further on this topic. Something discussing SSDs seems due here too. Here's a first draft of a longer discussion, to be inserted just after where it states the default value is 4.0: True random access to mechanical disk storage will normally be more expensive than this default suggests. The value used is lower to reflect caching effects. Some common random accesses to disk, such as indexed reads, are considered likely to be in cache. The default value can be thought of as modeling random access as 40 times as expensive as sequential, while expecting that 90% of random reads will actually be cached. If you believe a high cache rate is an incorrect assumption for your workload, you might increase random_page_cost to closer reflect the true cost of random reads against your storage. Correspondingly, if your data is likely to be completely cached, such as when the database is smaller than the total memory in the server, decreasing random_page_cost can be appropriate. Storage where the true cost of random reads is low, such as solid-state drives and similar memory-based devices, might also find lower values of random_page_cost better reflect the real-world cost of that operation. === I think of the value as being more like 80 times as expensive and a 95% hit rate, but the above seems more likely to turn into understandable math to a first-time reader of this section. I stopped just short of recommending a value for the completely cached case. I normally use 1.01 there; I know others prefer going fully to 1.0 instead. That argument seems like it could rage on for some time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!
On 02/07/2012 02:56 PM, Tom Lane wrote: Andrew Dunstan writes: On 01/31/2012 11:10 PM, Andrew Dunstan wrote: Here's a possible patch for the exclude-table-data problem along the lines you suggest. Should I apply this? I'm not happy with this yet. My core complaint is that pg_dump used to consider that creation of a TableDataInfo object for a table happens if and only if we're going to dump the table's data. And the comments (eg in pg_dump.h) still say that. But the previous patch left us in a halfway zone where sometimes we'd create a TableDataInfo object and then choose not to dump the data, and this patch doesn't get us out of that. I think we should either revert to the previous definition, or go over to a design wherein we always create TableDataInfo objects for all tables (but probably still excluding data-less relations such as views) and the whether-to-dump decision is expressed only by setting or not setting the object's dump flag. I worked a little bit on a patch to do the latter but found that it was more invasive than I'd hoped. Given the lack of any immediate payoff I think it'd probably make more sense to do the former. We could still centralize the decision making into makeTableDataInfo a bit more than now, but it should take the form of not creating the object at all, rather than creating it and then clearing its dump flag. OK, in this version we simply suppress creation of the TableDataInfo object if it's not wanted. I actually removed the code from makeTableDataInfo - there are only two places it gets called and doing it in those two spots seemed a bit cleaner. cheers andrew *** a/src/bin/pg_dump/pg_dump.c --- b/src/bin/pg_dump/pg_dump.c *** *** 1141,1154 selectDumpableTable(TableInfo *tbinfo) tbinfo->dobj.catId.oid)) tbinfo->dobj.dump = false; - /* If table is to be dumped, check that the data is not excluded */ - if (tbinfo->dobj.dump && ! - simple_oid_list_member(&tabledata_exclude_oids, - tbinfo->dobj.catId.oid)) - tbinfo->dobj.dumpdata = true; - else - tbinfo->dobj.dumpdata = false; - } /* --- 1141,1146 *** *** 1599,1608 dumpTableData(Archive *fout, TableDataInfo *tdinfo) DataDumperPtr dumpFn; char *copyStmt; - /* don't do anything if the data isn't wanted */ - if (!tbinfo->dobj.dumpdata) - return; - if (!dump_inserts) { /* Dump/restore using COPY */ --- 1591,1596 *** *** 1658,1663 getTableData(TableInfo *tblinfo, int numTables, bool oids) --- 1646,1656 && no_unlogged_table_data) continue; + /* Check that the data is not explicitly excluded */ + if (simple_oid_list_member(&tabledata_exclude_oids, + tblinfo[i].dobj.catId.oid)) + continue; + if (tblinfo[i].dobj.dump && tblinfo[i].dataObj == NULL) makeTableDataInfo(&(tblinfo[i]), oids); } *** *** 14127,14133 getExtensionMembership(Archive *fout, ExtensionInfo extinfo[], TableInfo *configtbl; configtbl = findTableByOid(atooid(extconfigarray[j])); ! if (configtbl && configtbl->dataObj == NULL) { /* * Note: config tables are dumped without OIDs regardless --- 14120,14132 TableInfo *configtbl; configtbl = findTableByOid(atooid(extconfigarray[j])); ! ! if (configtbl == NULL || ! simple_oid_list_member(&tabledata_exclude_oids, ! configtbl->dobj.catId.oid)) ! continue; ! ! if (configtbl->dataObj == NULL) { /* * Note: config tables are dumped without OIDs regardless *** a/src/bin/pg_dump/pg_dump.h --- b/src/bin/pg_dump/pg_dump.h *** *** 129,135 typedef struct _dumpableObject char *name; /* object name (should never be NULL) */ struct _namespaceInfo *namespace; /* containing namespace, or NULL */ bool dump; /* true if we want to dump this object */ - booldumpdata; /* true if we want data for this object */ bool ext_member; /* true if object is member of extension */ DumpId *dependencies; /* dumpIds of objects this one depends on */ int nDeps; /* number of valid dependencies */ --- 129,134 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for implementing SPI_gettypemod()
On ons, 2012-02-01 at 15:53 +0530, Chetan Suttraway wrote: > This is regarding the TODO item : > "Add SPI_gettypmod() to return a field's typemod from a TupleDesc" My first thought was, this should be spelled SPI_gettypmod(). Not sure what others think. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpoint sync pause
On 02/03/2012 11:41 PM, Jeff Janes wrote: -The steady stream of backend writes that happen between checkpoints have filled up most of the OS write cache. A look at /proc/meminfo shows around 2.5GB "Dirty:" "backend writes" includes bgwriter writes, right? Right. Has using a newer kernal with dirty_background_bytes been tried, so it could be set to a lower level? If so, how did it do? Or does it just refuse to obey below the 5% level, as well? Trying to dip below 5% using dirty_background_bytes slows VACUUM down faster than it improves checkpoint latency. Since the sort of servers that have checkpoint issues are quite often ones that have VACUUM ones, too, that whole path doesn't seem very productive. The one test I haven't tried yet is whether increasing the size of the VACUUM ring buffer might improve how well the server responds to a lower write cache. If there is 3GB of dirty data spread over>300 segments each segment is about full-sized (1GB) then on average<1% of each segment is dirty? If that analysis holds, then it seem like there is simply an awful lot of data has to be written randomly, no matter how clever the re-ordering is. In other words, it is not that a harried or panicked kernel or RAID control is failing to do good re-ordering when it has opportunities to, it is just that you dirty your data too randomly for substantial reordering to be possible even under ideal conditions. Averages are deceptive here. This data follows the usual distribution for real-world data, which is that there is a hot chunk of data that receives far more writes than average (particularly index blocks), along with a long tail of segments that are only seeing one or two 8K blocks modified (catalog data, stats, application metadata). Plenty of useful reordering happens here. It's happening in Linux's cache and in the controller's cache. The constant of stream of checkpoint syncs doesn't stop that. It does seems to do two bad things though: a) makes some of these bad cache filled situations more likely, and b) doesn't leave any I/O capacity unused for clients to get some work done. One of the real possibilities I've been considering more lately is that the value we've seen of the pauses during sync aren't so much about optimizing I/O, that instead it's from allowing a brief window of client backend I/O to slip in there between the cache filling checkpoint sync. Does the BBWC, once given an fsync command and reporting success, write out those block forthwith, or does it lolly-gag around like the kernel (under non-fsync) does? If it is waiting around for write-combing opportunities that will never actually materialize in sufficient quantities to make up for the wait, how to get it to stop? Was the sorted checkpoint with an fsync after every file (real file, not VFD) one of the changes you tried? As far as I know the typical BBWC is always working. When a read or a write comes in, it starts moving immediately. When it gets behind, it starts making seek decisions more intelligently based on visibility of the whole queue. But they don't delay doing any work at all the way Linux does. I haven't had very good luck with sorting checkpoints at the PostgreSQL relation level on server-size systems. There is a lot of sorting already happening at both the OS (~3GB) and BBWC (>=512MB) levels on this server. My own tests on my smaller test server--with a scaled down OS (~750MB) and BBWC (256MB) cache--haven't ever validated sorting as a useful technique on top of that. It's never bubbled up to being considered a likely win on the production one as a result. DEBUG: Sync #1 time=21.969000 gap=0.00 msec DEBUG: Sync #2 time=40.378000 gap=0.00 msec DEBUG: Sync #3 time=12574.224000 gap=3007.614000 msec DEBUG: Sync #4 time=91.385000 gap=2433.719000 msec DEBUG: Sync #5 time=2119.122000 gap=2836.741000 msec DEBUG: Sync #6 time=67.134000 gap=2840.791000 msec DEBUG: Sync #7 time=62.005000 gap=3004.823000 msec DEBUG: Sync #8 time=0.004000 gap=2818.031000 msec DEBUG: Sync #9 time=0.006000 gap=3012.026000 msec DEBUG: Sync #10 time=302.75 gap=3003.958000 msec Syncs 3 and 5 kind of surprise me. It seems like the times should be more bimodal. If the cache is already full, why doesn't the system promptly collapse, like it does later? And if it is not, why would it take 12 seconds, or even 2 seconds? Or is this just evidence that the gaps you are inserting are partially, but not completely, effective? Given a mix of completely random I/O, a 24 disk array like this system has is lucky to hit 20MB/s clearing it out. It doesn't take too much of that before even 12 seconds makes sense. The 45 second pauses are the ones demonstrating the controller's cached is completely overwhelmed. It's rare to see caching turn truly bimodal, because the model for it has both a variable ingress and egress rate involved. Even as the checkpoint sync is pushing st
Re: [HACKERS] 16-bit page checksums for 9.2
On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch wrote: > On Wed, Jan 11, 2012 at 10:12:31PM +, Simon Riggs wrote: >> v7 Thanks very much for the review. Just realised I hadn't actually replied... > This patch uses FPIs to guard against torn hint writes, even when the > checksums are disabled. One could not simply condition them on the > page_checksums setting, though. Suppose page_checksums=off and we're hinting > a page previously written with page_checksums=on. If that write tears, > leaving the checksum intact, that block will now fail verification. A couple > of ideas come to mind. (a) Read the on-disk page and emit an FPI only if the > old page had a checksum. (b) Add a checksumEnableLSN field to pg_control. > Whenever the cluster starts with checksums disabled, set the field to > InvalidXLogRecPtr. Whenever the cluster starts with checksums enabled and the > field is InvalidXLogRecPtr, set the field to the next LSN. When a checksum > failure occurs in a page with LSN older than the stored one, emit either a > softer warning or no message at all. We can only change page_checksums at restart (now) so the above seems moot. If we crash with FPWs enabled we repair any torn pages. > Not all WAL-bypassing operations use SetBufferCommitInfoNeedsSave() to dirty > the buffer. Here are other sites I noticed that do MarkBufferDirty() without > bumping the page LSN: > heap_xlog_visible() > visibilitymap_clear() > visibilitymap_truncate() > lazy_scan_heap() > XLogRecordPageWithFreeSpace() > FreeSpaceMapTruncateRel() > fsm_set_and_search() > fsm_vacuum_page() > fsm_search_avail() > Though I have not investigated each of these in detail, I suspect most or all > represent continued torn-page hazards. Since the FSM is just a hint, we do > not WAL-log it at all; it would be nicest to always skip checksums for it, > too. The visibility map shortcuts are more nuanced. Still checking all, but not as bad as the list looks. I'm removing chksum code from smgr and putting back into bufmgr and other locations. Patch footprint now looks like this. doc/src/sgml/config.sgml | 40 +++ src/backend/access/hash/hashpage.c|1 src/backend/access/heap/rewriteheap.c |4 src/backend/access/heap/visibilitymap.c |1 src/backend/access/nbtree/nbtree.c|1 src/backend/access/nbtree/nbtsort.c |3 src/backend/access/spgist/spginsert.c |2 src/backend/access/transam/twophase.c | 16 - src/backend/access/transam/xact.c |8 src/backend/access/transam/xlog.c | 114 src/backend/commands/tablecmds.c |2 src/backend/storage/buffer/bufmgr.c | 75 + src/backend/storage/buffer/localbuf.c |2 src/backend/storage/ipc/procarray.c | 63 +++- src/backend/storage/lmgr/proc.c |4 src/backend/storage/page/bufpage.c| 331 +- src/backend/utils/misc/guc.c | 14 + src/backend/utils/misc/postgresql.conf.sample | 15 - src/include/access/xlog.h |1 src/include/catalog/pg_control.h |3 src/include/catalog/storage.h |1 src/include/storage/bufpage.h | 107 ++-- src/include/storage/proc.h|3 src/include/storage/procarray.h |4 24 files changed, 743 insertions(+), 72 deletions(-) Patch is *not* attached here, yet. Still working on it. > When page_checksums=off and we read a page last written by page_checksums=on, > we still verify its checksum. If a mostly-insert/read site uses checksums for > some time and eventually wishes to shed the overhead, disabling the feature > will not stop the costs for reads of old data. They would need a dump/reload > to get the performance of a never-checksummed database. Let's either > unconditionally skip checksum validation under page_checksums=off or add a new > state like page_checksums=ignore for that even-weaker condition. Agreed, changed. >> --- a/doc/src/sgml/config.sgml >> +++ b/doc/src/sgml/config.sgml > >> + >> + Turning this parameter off speeds normal operation, but >> + might allow data corruption to go unnoticed. The checksum uses >> + 16-bit checksums, using the fast Fletcher 16 algorithm. With this >> + parameter enabled there is still a non-zero probability that an >> error >> + could go undetected, as well as a non-zero probability of false >> + positives. >> + > > What sources of false positives do you intend to retain? I see none. Will reword. >> --- a/src/backend/catalog/storage.c >> +++ b/src/backend/catalog/storage.c >> @@ -20,6 +20,7 @@ >> #include "postgres.h" >> >> #include "access/visibilitymap.h" >> +#include "access/transam.h" >> #include "access/xact.h" >> #i
Re: [HACKERS] Text-any concatenation volatility acting as optimization barrier
On 02/07/2012 03:36 PM, Marti Raudsepp wrote: On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan wrote: It gets worse if you replace the expression with a call to a (non-sql) function returning text, which was in fact the original use case. Then you're pretty much hosed. Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE should solve it -- did you try that? Hmm, your're right. I could have sworn I tried that. That still leaves the oddity you reported, though. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Text-any concatenation volatility acting as optimization barrier
On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan wrote: > It gets worse if you replace the expression with a call to a (non-sql) > function returning text, which was in fact the original use case. Then > you're pretty much hosed. Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE should solve it -- did you try that? Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Text-any concatenation volatility acting as optimization barrier
On 02/07/2012 03:18 PM, Marti Raudsepp wrote: Hi list, Andrew Dunstan reported an awkward-seeming case on IRC where shifting around a concatenation expression in a view made the planner choose a good or a bad execution plan. Simplified, it boils down to this: db=# create table foo(i int); db=# explain verbose select i from (select i, i::text || 'x' as asd from foo) as subq; Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4) Output: foo.i db=# explain verbose select i from (select i, i || 'x'::text as asd from foo) as subq; Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4) Output: subq.i -> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4) Output: foo.i, ((foo.i)::text || 'x'::text) Case #1 uses the normal textcat(text, text) operator by automatically coercing 'x' as text. However, case #2 uses the anytextcat(anynonarray, text), which is marked as volatile thus acts as an optimization barrier. Later, the anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has no trace of what happened. Is this something we can, or want, to fix? One way would be doing preprocess_expression() before pull_up_subqueries() so function inlining happens earlier, but I can't imagine what unintended consequences that might have. Another option would be creating explicit immutable text || foo operators for common types, but that sounds pretty hacky. It gets worse if you replace the expression with a call to a (non-sql) function returning text, which was in fact the original use case. Then you're pretty much hosed. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Wed, Jan 11, 2012 at 08:26:52AM +, Benedikt Grundmann wrote: > (replying just to you) > On 10/01/12 15:22, Greg Smith wrote: > > On 1/5/12 5:04 AM, Benedikt Grundmann wrote: > > That sort of thing is one reason why all attempts so far to set > > random_page_cost based on physical characteristics haven't gone > > anywhere useful. The setting is sort of overloaded right now, it's a > > fuzzy mix of true random seek cost blended with some notion of cache > > percentage. Trying to bring some measurements to bear on it is a less > > effective approach than what people actually do here. Monitor the > > profile of query execution, change the value, see what happens. Use > > that as feedback for what direction to keep going; repeat until > > you're just spinning with no improvements. > > > Thank you very much for the reply it is very interesting. I'm > excited to hear that documentation in that area will improve in > 9.2. It's interesting postgres has remarkable good documentation > but it is a sufficiently complex system that to actually sensible > tune the knobs provided you have to understand quite a lot about > what is going on. A colleague of mine likes to say > "all abstractions leak", which seems very appropriate in this case. Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I looked over the random_page_cost documentation and remembered I was always concerned about how vague it was about caching effects, so I wrote the attached doc patch to explicity state it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 3a84321..19e3e36 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** SET ENABLE_SEQSCAN TO OFF; *** 2590,2595 --- 2590,2597 Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. + (The default is lower than the typical difference between random + and sequential storage access speed because of caching effects.) This value can be overridden for tables and indexes in a particular tablespace by setting the tablespace parameter of the same name (see ). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Text-any concatenation volatility acting as optimization barrier
Hi list, Andrew Dunstan reported an awkward-seeming case on IRC where shifting around a concatenation expression in a view made the planner choose a good or a bad execution plan. Simplified, it boils down to this: db=# create table foo(i int); db=# explain verbose select i from (select i, i::text || 'x' as asd from foo) as subq; Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4) Output: foo.i db=# explain verbose select i from (select i, i || 'x'::text as asd from foo) as subq; Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4) Output: subq.i -> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4) Output: foo.i, ((foo.i)::text || 'x'::text) Case #1 uses the normal textcat(text, text) operator by automatically coercing 'x' as text. However, case #2 uses the anytextcat(anynonarray, text), which is marked as volatile thus acts as an optimization barrier. Later, the anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has no trace of what happened. Is this something we can, or want, to fix? One way would be doing preprocess_expression() before pull_up_subqueries() so function inlining happens earlier, but I can't imagine what unintended consequences that might have. Another option would be creating explicit immutable text || foo operators for common types, but that sounds pretty hacky. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When do we lose column names?
Andrew Dunstan writes: > On 02/07/2012 12:47 PM, Tom Lane wrote: >> In general I think we'd have to require that colnames be supplied in all >> RowExprs if we go this way. Anyplace that's trying to slide by without >> will have to be fixed. I don't recall how many places that is. > I just had a thought that maybe we could make this simpler by dummying > up a list of colnames if we don't have one, instead of that assertion. > Or am I on the wrong track. Well, if there are more than one or two RowExpr creators for which a dummy set of colnames is the best we can do anyway, that might be a reasonable answer. But I think it would encourage people to be lazy and let the dummy colnames be used even when they can do better, so I'd rather not take this as a first-choice solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!
Andrew Dunstan writes: > On 01/31/2012 11:10 PM, Andrew Dunstan wrote: >> Here's a possible patch for the exclude-table-data problem along the >> lines you suggest. > Should I apply this? I'm not happy with this yet. My core complaint is that pg_dump used to consider that creation of a TableDataInfo object for a table happens if and only if we're going to dump the table's data. And the comments (eg in pg_dump.h) still say that. But the previous patch left us in a halfway zone where sometimes we'd create a TableDataInfo object and then choose not to dump the data, and this patch doesn't get us out of that. I think we should either revert to the previous definition, or go over to a design wherein we always create TableDataInfo objects for all tables (but probably still excluding data-less relations such as views) and the whether-to-dump decision is expressed only by setting or not setting the object's dump flag. I worked a little bit on a patch to do the latter but found that it was more invasive than I'd hoped. Given the lack of any immediate payoff I think it'd probably make more sense to do the former. We could still centralize the decision making into makeTableDataInfo a bit more than now, but it should take the form of not creating the object at all, rather than creating it and then clearing its dump flag. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When do we lose column names?
On 02/07/2012 12:47 PM, Tom Lane wrote: Andrew Dunstan writes: On 11/16/2011 10:38 PM, Tom Lane wrote: Upon further review, this patch would need some more work even for the RowExpr case, because there are several places that build RowExprs without bothering to build a valid colnames list. It's clearly soluble if anyone cares to put in the work, but I'm not personally excited enough to pursue it ... The patch itself causes a core dump with the current regression tests. Yeah, observing that was what made me write the above. I've been looking at the other places that build RowExprs. Most of them look OK and none seem clearly in need of fixing at first glance. Which do you think need attention? In general I think we'd have to require that colnames be supplied in all RowExprs if we go this way. Anyplace that's trying to slide by without will have to be fixed. I don't recall how many places that is. I just had a thought that maybe we could make this simpler by dummying up a list of colnames if we don't have one, instead of that assertion. Or am I on the wrong track. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
Jim "Decibel!" Nasby wrote: I agree that it's probably pretty unusual to index floats. FWIW: Cubes and points are floats, right? So would spatial indexes benefit from this optimization, or is it only raw floats? Jay Levitt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Progress on fast path sorting, btree index creation time
On 2/6/12 3:19 PM, Bruce Momjian wrote: > While we're waiting for anyone else to weigh in with an opinion on the > right place to draw the line here, do you want to post an updated > patch with the changes previously discussed? Well, I think we have to ask not only how many people are using float4/8, but how many people are sorting or creating indexes on them. I think it would be few and perhaps should be eliminated. I agree that it's probably pretty unusual to index floats. My objection was on the assumption that float8 is valid but float4 isn't. If we are going to provide a fast-path for one then we should do it for both if for no other reason than least surprise. -- Jim C. Nasby, Database architect...@nasby.net 512.569.9461 (cell)http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bugs/slowness inserting and indexing cubes
Jay Levitt wrote: [Posted at Andres's request] TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in various builds. And I bet you'll want the test script... sigh. attached. \c postgres drop database if exists slowcube; create database slowcube; \c slowcube \timing create schema slowcube; set search_path to slowcube; create extension cube; set work_mem to '1GB'; set maintenance_work_mem to '1GB'; create table cubetest ( position cube ); insert into cubetest (position) select cube(array[random() * 1000, random() * 1000, random() * 1000]) from generate_series(1,100); select now(); create index q on cubetest using gist(position); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bugs/slowness inserting and indexing cubes
[Posted at Andres's request] TL;DR: Inserting and indexing cubes is slow and/or broken in various ways in various builds. NOTABLE PROBLEMS 1. In 9.1.2, inserting 10x rows takes 19x the time. - 9.1-HEAD and 9.2 "fix" this; it now slows down linearly - but: 10s > 8s > 5s! - but: comparing Ubuntu binary w/vanilla source build on virtual disks, might not be significant 2. In both 9.1 and 9.2, there is a long delay before CREATE INDEX realizes it can't work on an unlogged table 3. In 9.2, creating the 10-million-row index always fails 4. 9.1-HEAD never successfully indexes 10 million rows ("never" = at least 20 minutes on two runs; I will follow up in a few hours) DETAILS Times are in seconds, single run. +---+-+-+--+--+ | Platform | 1m rows | 1m rows | 10m rows | 10m rows | | | INSERT | CR NDX | INSERT | CR NDX | +---+-+-+--+--+ | 9.1.2 logged | 5 | 35 | 98 | 434 | | 9.1.2 unlogged| 2 | 34[**] | 22 | 374[**] | | 9.1-HEAD logged | 10 | 65 | 89 | [***]| | 9.1-HEAD unlogged | 2 | 39 | 20 | 690[**] | | 9.2 logged| 8 | 57 | 87 | 509[*] | | 9.2 unlogged | 2 | 33[**] | 21 | 327[*] | +---+-+-+--+--+ [*] psql:slowcube.sql:20: ERROR: node buffer of page being split (121550) does not exist [**] psql:slowcube.sql:21: ERROR: unlogged GiST indexes are not supported [***] never completed after 10-20 minutes; nothing in server.log at default logging levels, postgres process consuming about 1 CPU in IOWAIT, checkpoints every 7-8 seconds VARIABILITY A few runs in a row on 9.1-HEAD, 1 million rows, logged: ++--+ | INSERT | CREATE INDEX | ++--+ | 10 | 65 | | 8 | 61 | | 7 | 59 | | 8 | 61 | | 7 | 55 | ++--+ SYSTEM SPECS Amazon EC2, EBS-backed, m1.large 7.5GB RAM, 2 cores Intel(R) Xeon(R) CPU E5645 @ 2.40GHz shared_buffers = 1867MB checkpoint_segments = 32 effective_cache_size = 3734MB 9.1.2: installed binaries from Ubuntu's oneiric repo 9.1-HEAD: REL9_1_STABLE, ef19c9dfaa99a2b78ed0f78aa4a44ed31636fdc4, built with simple configure/make/make install 9.2: master, 1631598ea204a3b05104f25d008b510ff5a5c94a, built with simple configure/make/make install 9.1.2 and 9.1-HEAD were run on different (but identically configured) instances. 9.1-HEAD and 9.2 were run on the same instance, but EBS performance is unpredictable. YMMV. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting -Werror in CFLAGS
On Wed, Jan 04, 2012 at 01:44:07PM -0500, Robert Haas wrote: > On Tue, Jan 3, 2012 at 9:23 PM, Tom Lane wrote: > > Robert Haas writes: > >> On Tue, Jan 3, 2012 at 7:39 PM, Peter Geoghegan > >> wrote: > >>> Yes, I know that these only appeared in GCC 4.6+ and as such are a > >>> relatively recent phenomenon, but there has been some effort to > >>> eliminate them, and if I could get a non-hacked -Werror build I'd feel > >>> happy enough about excluding them as already outlined. > > > >> I just do this: > >> echo COPT=-Werror > src/Makefile.custom > >> ...which seems to work reasonably well. > > > > I see no point in -Werror whatsoever. If you aren't examining the make > > output for warnings, you're not following proper development practice > > IMO. > > I find -Werror to be a convenient way to examine the output for > warnings. Otherwise they scroll off the screen. Yeah, I could save > the output to a file and grep it afterwards, but that seems less > convenient. Our src/tools/pgtest does this: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/tools/pgtest;hb=HEAD -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When do we lose column names?
Andrew Dunstan writes: > On 11/16/2011 10:38 PM, Tom Lane wrote: >> Upon further review, this patch would need some more work even for the >> RowExpr case, because there are several places that build RowExprs >> without bothering to build a valid colnames list. It's clearly soluble >> if anyone cares to put in the work, but I'm not personally excited >> enough to pursue it ... > The patch itself causes a core dump with the current regression tests. Yeah, observing that was what made me write the above. > I've been looking at the other places that build RowExprs. Most of them > look OK and none seem clearly in need of fixing at first glance. Which > do you think need attention? In general I think we'd have to require that colnames be supplied in all RowExprs if we go this way. Anyplace that's trying to slide by without will have to be fixed. I don't recall how many places that is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add protransform for numeric, varbit, and temporal types
On Tue, Jan 3, 2012 at 10:31 AM, Robert Haas wrote: > On Sat, Dec 31, 2011 at 7:36 PM, Noah Misch wrote: >> Building on commit 8f9fe6edce358f7904e0db119416b4d1080a83aa, this adds >> protransform functions to the length coercions for numeric, varbit, >> timestamp, >> timestamptz, time, timetz and interval. This mostly serves to make more >> ALTER >> TABLE ALTER TYPE operations avoid a rewrite, including numeric(10,2) -> >> numeric(12,2), varbit(4) -> varbit(8) and timestamptz(2) -> timestamptz(4). >> The rules for varbit are exactly the same as for varchar. Numeric is >> slightly >> more complex: >> >> * Flatten calls to our length coercion function that solely represent >> * increases in allowable precision. Scale changes mutate every datum, so >> * they are unoptimizable. Some values, e.g. 1E-1001, can only fit into an >> * unconstrained numeric, so a change from an unconstrained numeric to any >> * constrained numeric is also unoptimizable. >> >> time{,stamp}{,tz} are similar to varchar for these purposes, except that, for >> example, plain "timestamptz" is equivalent to "timestamptz(6)". interval has >> a vastly different typmod format, but the principles applicable to length >> coercion remain the same. >> >> Under --disable-integer-datetimes, I'm not positive that timestamp_scale() is >> always a no-op when one would logically expect as much. Does there exist a >> timestamp such that v::timestamp(2) differs from v:timestamp(2)::timestamp(4) >> due to floating point rounding? Even if so, I'm fairly comfortable calling >> it >> a feature rather than a bug to avoid perturbing values that way. >> >> After these patches, the only core length coercion casts not having >> protransform functions are those for "bpchar" and "bit". For those, we could >> only optimize trivial cases of no length change. I'm not planning to do so. > > This is cool stuff. I will plan to review this once the CF starts. I've committed the numeric and varbit patches and will look at the temporal one next. I did notice one odd thing, though: sometimes we seem to get a rebuild on the toast index for no obvious reason: rhaas=# set client_min_messages=debug1; SET rhaas=# create table foo (a serial primary key, b varbit); NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for serial column "foo.a" DEBUG: building index "pg_toast_16430_index" on table "pg_toast_16430" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" DEBUG: building index "foo_pkey" on table "foo" CREATE TABLE rhaas=# alter table foo alter column b set data type varbit(4); DEBUG: rewriting table "foo" DEBUG: building index "foo_pkey" on table "foo" ALTER TABLE rhaas=# alter table foo alter column b set data type varbit; DEBUG: building index "pg_toast_16430_index" on table "pg_toast_16430" ALTER TABLE Strangely, it doesn't happen if I add another column to the table: rhaas=# set client_min_messages=debug1; SET rhaas=# create table foo (a serial primary key, b varbit, c varbit); NOTICE: CREATE TABLE will create implicit sequence "foo_a_seq" for serial column "foo.a" DEBUG: building index "pg_toast_16481_index" on table "pg_toast_16481" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo" DEBUG: building index "foo_pkey" on table "foo" CREATE TABLE rhaas=# alter table foo alter column b set data type varbit(4); DEBUG: building index "pg_toast_16490_index" on table "pg_toast_16490" DEBUG: rewriting table "foo" DEBUG: building index "foo_pkey" on table "foo" ALTER TABLE rhaas=# alter table foo alter column b set data type varbit; ALTER TABLE There may not be any particular harm in a useless rebuild of the TOAST table index (wasted effort excepted), but my lack of understanding of why this is happening causes me to fear that there's a bug, not so much in these patches as in the core alter table code that is enabling skipping table and index rebuilds. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some longer, larger pgbench tests with various performance-related patches
On 01/24/2012 03:53 PM, Robert Haas wrote: There are two graphs for each branch. The first is a scatter plot of latency vs. transaction time. I found that graph hard to understand, though; I couldn't really tell what I was looking at. So I made a second set of graphs which graph number of completed transactions in a given second of the test against time. Note that you're now reinventing parts of pgbench-tools; the main two graphs it gives are the latency scatter plot and TPS per second. The things you're likely to find interesting next are maximum latency, 90th percentile latency, and a delta for what changed in pg_stat_bgwriter during the test; those are the other things I track in that program. I'm working toward publishing my own tests of the performance patches still considered useful by the end of the week. Murphy's Law has active on that project since it started though--server crashed the day I left on a week long trip, and I've been sick ever since getting back. First, some of these transactions had really long latency. Second, there are a remarkable number of seconds all of the test during which no transactions at all manage to complete, sometimes several seconds in a row. These periods have in my tests always been associated with Linux turning aggressive about cleaning out its write cache, either due to fsync request or simply crossing one of its thresholds for doing so. My current record is an 80 second pause with no transactions completing. One of things I expect to add to pgbench_tools within the next week is tracking how much dirty memory is accumulating during each test. Seeing that graph overlaid on top of the rest makes a lot of what's happening at any time more obvious. Noting when the checkpoints happen is a bit less interesting, because once the first one happens, they happen almost continuously. You really need to track when the write and sync phases are happening for that to be really useful. This circles back to why I proposed exposing those timing bits in pg_stat_bgwriter. pgbench-tools already grabs data from it, which avoids all the mess around log file parsing. If I could do that more often and extract checkpoint timing from that data, it would make labelling graphs like these much easier to do, from the client that's running the benchmark even. Third, all of the tests initially start of processing transactions very quickly, and get slammed down very hard, probably because the very high rate of transaction processing early on causes a checkpoint to occur around 200 s. At the beginning of a write-heavy pgbench run, rate is high until one of these two things happen: 1) A checkpoint begins 2) Linux's write cache threshold (typically /proc/sys/vm/dirty_background_ratio) worth of dirty memory accumulates. Note that (1) on its own isn't necessarily the problem, it's something the case that it just makes (2) happen much faster then. Basically, the first 30 to 150 seconds of any write-heavy test always have an inflated speed. You're writing into the OS cache at maximum speed, and none of those writes are making it to physical disk--except perhaps for the WAL, which is all fast and sequential. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog location arithmetic
On 26-01-2012 06:19, Fujii Masao wrote: Thanks for your review. Comments below. > When I compiled the source with xlogdiff.patch, I got the following warnings. > > xlogfuncs.c:511:2: warning: format '%lX' expects argument of type > 'long unsigned int *', but argument 3 has type 'uint64 *' [-Wformat] > What is your compiler? I'm using gcc 4.6.2. I refactored the patch so I'm now using XLogRecPtr and %X. > postgres=# SELECT pg_xlog_location_diff('0/274', '0/274'); > ERROR: xrecoff "274" is out of valid range, 0..A4A534C > Ugh? I can't reproduce that. It seems to be related to long int used by the prior version. > Since pg_xlog_location_diff() can be executed during recovery, > the above needs to be updated. > Fixed. >> While the output was int8 I could use >> pg_size_pretty but now I couldn't. I attached another patch that implements >> pg_size_pretty(numeric). > I realized that it collides with the pg_size_pretty(int8) if we don't specify a type. Hence, I decided to drop the pg_size_pretty(int8) in favor of pg_size_pretty(numeric). It is slower than the former but it is not a performance critical function. > According to the above source code comment in pg_proc.h, ISTM > pg_size_pretty() for numeric also needs to have its own DESCR(). > Fixed. > According to "man strcat", the dest string must have enough space for > the result. > "buf" has enough space? > Ops. Fixed. -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 236a60a..511a918 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -14942,7 +14942,7 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); -pg_size_pretty(bigint) +pg_size_pretty(numeric) text Converts a size in bytes into a human-readable format with size units diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 26a8c01..d4a142b 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -24,6 +24,7 @@ #include "storage/fd.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/numeric.h" #include "utils/rel.h" #include "utils/relmapper.h" #include "utils/syscache.h" @@ -506,48 +507,101 @@ pg_total_relation_size(PG_FUNCTION_ARGS) PG_RETURN_INT64(size); } -/* - * formatting with size units - */ Datum pg_size_pretty(PG_FUNCTION_ARGS) { - int64 size = PG_GETARG_INT64(0); - char buf[64]; - int64 limit = 10 * 1024; - int64 limit2 = limit * 2 - 1; + Numeric size = PG_GETARG_NUMERIC(0); + Numeric limit, limit2; + + char *buf, *result; - if (size < limit) - snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size); + limit = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) (10 * 1024; + limit2 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) (10 * 1024 * 2 - 1; + + if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit + { + buf = DatumGetCString(DirectFunctionCall1(numeric_out, NumericGetDatum(size))); + result = palloc(strlen(buf) + 7); + strcpy(result, buf); + strcat(result, " bytes"); + } else { - size >>= 9;/* keep one extra bit for rounding */ - if (size < limit2) - snprintf(buf, sizeof(buf), INT64_FORMAT " kB", - (size + 1) / 2); + Numeric arg2; + + /* keep one extra bit for rounding */ + /* size >>= 9 */ + arg2 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) pow(2, 9; + size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size), NumericGetDatum(arg2))); + + if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit2 + { + /* size = (size + 1) / 2 */ + size = DatumGetNumeric(DirectFunctionCall2(numeric_add, NumericGetDatum(size), + DirectFunctionCall1(int8_numeric, Int64GetDatum(1; + size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size), + DirectFunctionCall1(int8_numeric, Int64GetDatum(2; + buf = DatumGetCString(DirectFunctionCall1(numeric_out, NumericGetDatum(size))); + result = palloc(strlen(buf) + 4); + strcpy(result, buf); + strcat(result, " kB"); + } else { - size >>= 10; - if (size < limit2) -snprintf(buf, sizeof(buf), INT64_FORMAT " MB", - (size + 1) / 2); + Numeric arg3; + + /* size >>= 10 */ + arg3 = DatumGetNumeric(DirectFunctionCall1(int8_numeric, Int64GetDatum((int64) pow(2, 10; + size = DatumGetNumeric(DirectFunctionCall2(numeric_div_trunc, NumericGetDatum(size), NumericGetDatum(arg3))); + + if (DatumGetBool(DirectFunctionCall2(numeric_lt, NumericGetDatum(size), NumericGetDatum(limit2 + { +/* size = (size + 1) / 2 */ +size = DatumGetNumeric(DirectFunctionCal
Re: [HACKERS] double writes using "double-write buffer" approach [WIP]
On 02/07/2012 12:09 AM, Dan Scales wrote: So, yes, good point -- double writes cannot replace the functionality of full_page_writes for base backup. If double writes were in use, they might be automatically switched over to full page writes for the duration of the base backup. And the double write file should not be part of the base backup. There is already a check for this sort of problem during the base backup. It forces full_pages_writes on for the backup, even if the running configuration has it off. So long as double writes can be smoothly turned off and back on again, that same section of code can easily be made to handle that, too. As far as not making the double write file part of the base backup, I was assuming that would go into a subdirectory under pg_xlog by default. I would think that people who relocate pg_xlog using one of the methods for doing that would want the double write buffer to move as well. And if it's inside pg_xlog, existing base backup scripts won't need to be changed--the correct ones already exclude pg_xlog files. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
Robert Haas writes: > most places that issue queries can simply use those routines without > needing to peek under the hood into the ArchiveHandle. This is not > quite enough to get rid of g_conn, but it's close: the major stumbling > block at this point is probably exit_nicely(). The gyrations we're > going through to make sure that AH->connection gets closed before > exiting are fairly annoying; maybe we should invent something in > dumputils.c along the line of the backend's on_shmem_exit(). Do we actually care about closing the connection? Worst case is that backend logs an "unexpected EOF" message. But yeah, an atexit hook might be the easiest solution. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Tue, Feb 7, 2012 at 17:29, Heikki Linnakangas wrote: > On 07.02.2012 16:55, Tom Lane wrote: >> >> (The integer vs float TimestampTz issue is a kind of portability >> problem, but we've already bought into the assumption that sender and >> receiver must be built with the same choice, no?) > > > Hmm, true. In hindsight, I think that was a bad choice, but it's a bit late > to change that. pg_basebackup doesn't otherwise care about the integer/float > timestamps, but it does send a timestamp back to the server. You won't be > able to actually start up the database if the config options don't match, > but I think it would be good if pg_basebackup still worked across platforms > and versions. For example, you might have a central backup server that calls > pg_basebackup on several database servers, running on different platforms. > > In 9.0, the only field in the protocol that depends on timestamp format is > WalDataMessageHeader->sendTime. That goes from server to client, and > pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced > StandbyReplyMessage->sendTime, which is sent from client to server, but > looking at the code it looks like the server doesn't use it for anything. In > 9.2, we added WalSndrMessage->sendTime, which is used by a standby server to > calculate how far behind the standby is. > > I'm tempted to just change all of those TimestampTz fields to something > that's independent of integer/float timestamp setting, in 9.2. At a quick > glance, it seems that it wouldn't break anything. In general, I think that would work. Since we can't replicate across versions anyway. Will it break using pg_basebackup 9.2 on a 9.1 server, though? that would also be very useful in the scenario of the central server... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On 07.02.2012 16:55, Tom Lane wrote: (The integer vs float TimestampTz issue is a kind of portability problem, but we've already bought into the assumption that sender and receiver must be built with the same choice, no?) Hmm, true. In hindsight, I think that was a bad choice, but it's a bit late to change that. pg_basebackup doesn't otherwise care about the integer/float timestamps, but it does send a timestamp back to the server. You won't be able to actually start up the database if the config options don't match, but I think it would be good if pg_basebackup still worked across platforms and versions. For example, you might have a central backup server that calls pg_basebackup on several database servers, running on different platforms. In 9.0, the only field in the protocol that depends on timestamp format is WalDataMessageHeader->sendTime. That goes from server to client, and pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced StandbyReplyMessage->sendTime, which is sent from client to server, but looking at the code it looks like the server doesn't use it for anything. In 9.2, we added WalSndrMessage->sendTime, which is used by a standby server to calculate how far behind the standby is. I'm tempted to just change all of those TimestampTz fields to something that's independent of integer/float timestamp setting, in 9.2. At a quick glance, it seems that it wouldn't break anything. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of: explain / allow collecting row counts without timing info
On Sun, Feb 5, 2012 at 12:44 AM, Kevin Grittner wrote: > Tom Lane wrote: > >> Yeah, I think we need to preserve that property. Unexpectedly >> executing query (which may have side-effects) is a very dangerous >> thing. People are used to the idea that ANALYZE == execute, and >> adding random other flags that also cause execution is going to >> burn somebody. > > +1 > > FWIW, another reason not to use Robert's suggested syntax is that you > get "rows=n" entries with or without the actual run. You just don't > get the "actual" block to compare to the estimate. So ROWS as an > option would be very ambiguous. OK, so based on that resoundingly unanimous input, I've committed Tomas's last version. I made some alterations to the sgml documentation to avoid mentioning "gettimeofday" specifically, because that might not be the call everywhere (e.g. Windows) and even if it is, it doesn't seem too user-friendly. The code is entirely as he had it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for parallel pg_dump
On Fri, Feb 3, 2012 at 10:43 AM, Joachim Wieland wrote: > On Fri, Feb 3, 2012 at 7:52 AM, Robert Haas wrote: >> If you're OK with that much I'll go do it. > > Sure, go ahead! It turns out that (as you anticipated) there are some problems with my previous proposal. For one thing, an Archive isn't really just a connection, because it's also used as a data sink by passing it to functions like ArchiveEntry(). For two things, as you probably realized but I failed to, ConnectDatabase() is already setting AH->connection to the same PGconn it returns, so the idea that we can potentially have multiple connection objects using the existing framework is not really true; or at least it's going to require more work than I originally thought. I think it might still be worth doing at some point, but I think we probably need to clean up some of the rest of this mess first. I've now rejiggered things so that the Archive is passed down to everything that needs it, so the global variable g_fout is gone. I've also added a couple of additional accessors to pg_backup_db.c so that most places that issue queries can simply use those routines without needing to peek under the hood into the ArchiveHandle. This is not quite enough to get rid of g_conn, but it's close: the major stumbling block at this point is probably exit_nicely(). The gyrations we're going through to make sure that AH->connection gets closed before exiting are fairly annoying; maybe we should invent something in dumputils.c along the line of the backend's on_shmem_exit(). I'm starting to think it might make sense to press on with this refactoring just a bit further and eliminate the distinction between Archive and ArchiveHandle. Given a few more accessors (and it really looks like it would only be a few), pg_dump.c could treat an ArchiveHandle as an opaque struct, which would accomplish more or less the same thing as the current design, but in a less confusing fashion - i.e. without giving the reader the idea that the author desperately wishes he were coding in C++. That would allow simplification in a number other places; just to take one example, we wouldn't need both appendStringLiteralAH and appendStringLiteralAHX. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Tue, Feb 7, 2012 at 9:44 AM, Marko Kreen wrote: >> - What is the right (or recommended) way to prevent from throwing >> exceptoin in row-processor callback function? When author should use >> PG_TRY block to catch exception in the callback function? > > When it calls backend functions that can throw exceptions? > As all handlers running in backend will want to convert data > to Datums, that means "always wrap handler code in PG_TRY"? I would hate to have such a rule. PG_TRY isn't free, and it's prone to subtle bugs, like failing to mark enough stuff in the same function "volatile". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Can PQstatus() be used by Application to check connection to postgres periodically?
Hello All, My application has to do a real time data upload to PostgreSQL server. Every time i have to do a real time upload, i do not wish to open new connection. I want to open a connection once [when my application comes up] and periodically check if the connection is active. Can PQstatus() be used by application to check the status of the connection already established? If PQstatus() cannot be used, does PostgreSQL provide alternate interface to check the status of the connection. Note : I do not wish to open connection on every real time upload as its an overhead for the application. Appreciate any help! WBR, Sujay -- View this message in context: http://postgresql.1045698.n5.nabble.com/Can-PQstatus-be-used-by-Application-to-check-connection-to-postgres-periodically-tp5462315p5462315.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] double writes using "double-write buffer" approach [WIP]
>> I think it is a good idea, and can help double-writes perform better in the >> case of lots of backend evictions. I don't understand this point, because from the data in your mail, it appears that when shared buffers are less means when more evictions can happen, the performance is less. ISTM that the performance is less incase shared buffers size is less because I/O might happen by the backend process which can degrade performance. Is there any problem if the double-write happens only by bgwriter or checkpoint. Something like whenever backend process has to evict the buffer, it will do same as you have described that write in a double-write buffer, but bgwriter will check this double-buffer and flush from it. Also whenever any backend will see that the double buffer is more than 2/3rd or some threshhold value full it will tell bgwriter to flush from double-write buffer. This can ensure very less I/O by any backend. -Original Message- From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Dan Scales Sent: Saturday, January 28, 2012 4:02 AM To: PG Hackers Subject: [HACKERS] double writes using "double-write buffer" approach [WIP] I've been prototyping the double-write buffer idea that Heikki and Simon had proposed (as an alternative to a previous patch that only batched up writes by the checkpointer). I think it is a good idea, and can help double-writes perform better in the case of lots of backend evictions. It also centralizes most of the code change in smgr.c. However, it is trickier to reason about. The idea is that all page writes generally are copied to a double-write buffer, rather than being immediately written. Note that a full copy of the page is required, but can folded in with a checksum calculation. Periodically (e.g. every time a certain-size batch of writes have been added), some writes are pushed out using double writes -- the pages are first written and fsynced to a double-write file, then written to the data files, which are then fsynced. Then double writes allow for fixing torn pages, so full_page_writes can be turned off (thus greatly reducing the size of the WAL log). The key changes are conceptually simple: 1. In smgrwrite(), copy the page to the double-write buffer. If a big enough batch has accumulated, then flush the batch using double writes. [I don't think I need to intercept calls to smgrextend(), but I am not totally sure.] 2. In smgrread(), always look first in the double-write buffer for a particular page, before going to disk. 3. At the end of a checkpoint and on shutdown, always make sure that the current contents of the double-write buffer are flushed. 4. Pass flags around in some cases to indicate whether a page buffer needs a double write or not. (I think eventually this would be an attribute of the buffer, set when the page is WAL-logged, rather than a flag passed around.) 5. Deal with duplicates in the double-write buffer appropriately (very rarely happens). To get good performance, I needed to have two double-write buffers, one for the checkpointer and one for all other processes. The double-write buffers are circular buffers. The checkpointer double-write buffer is just a single batch of 64 pages; the non-checkpointer double-write buffer is 128 pages, 2 batches of 64 pages each. Each batch goes to a different double-write file, so that they can be issued independently as soon as each batch is completed. Also, I need to sort the buffers being checkpointed by file/offset (see ioseq.c), so that the checkpointer batches will most likely only have to write and fsync one data file. Interestingly, I find that the plot of tpm for DBT2 is much smoother (though still has wiggles) with double writes enabled, since there are no unpredictable long fsyncs at the end (or during) a checkpoint. Here are performance numbers for double-write buffer (same configs as previous numbers), for 2-processor, 60-minute 50-warehouse DBT2. One the right shows the size of the shared_buffers, and the size of the RAM in the virtual machine. FPW stands for full_page_writes, DW for double_writes. 'two disk' means the WAL log is on a separate ext3 filesystem from the data files. FPW off FPW on DW on, FPW off one disk: 15488 13146 11713[5G buffers, 8G VM] two disk: 18833 16703 18013 one disk: 12908 111599758[3G buffers, 6G VM] two disk: 14258 12694 11229 one disk 1082998655806[1G buffers, 8G VM] two disk 13605 126945682 one disk: 675261294878 two disk: 725366775239[1G buffers, 2G VM] The performance of DW on the small cache cases (1G shared_buffers) is now much better, though still not as good as FPW on. In the medium cache case (3G buffers), where there are significant backend dirty
Re: [HACKERS] patch for preventing the specification of conflicting transaction read/write options
Chetan Suttraway wrote: > This is regarding the TODO item: > "Prevent the specification of conflicting transaction read/write > options" > > listed at: > http://wiki.postgresql.org/wiki/Todo Thanks for chipping away at items on the list. > Please pass on any inputs on the patch. Right now we want people focusing on fixing bugs and reviewing patches submitted by the start of the current CommitFest. Please add this to the open CF so we don't lose track of it. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing special case OID generation
Simon Riggs writes: > So ISTM that we should just strip out the OID handling code and just > have a system sequence defined like this I think this is a pretty poor idea, because the overhead of nextval() is quite a lot larger than the overhead to get an OID. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
Heikki Linnakangas writes: > On 07.02.2012 09:03, Fujii Masao wrote: >> What about changing receivelog.c so that it uses time_t instead of >> TimestampTz? Which would make the code simpler, I think. > Hmm, that would reduce granularity to seconds. It also creates portability issues that I'd just as soon not deal with, ie, time_t is not the same width on all platforms. (The integer vs float TimestampTz issue is a kind of portability problem, but we've already bought into the assumption that sender and receiver must be built with the same choice, no?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote: > (2012/02/02 23:30), Marko Kreen wrote: > > On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote: > >> Hello, This is new version of dblink.c > >> > >> - Memory is properly freed when realloc returns NULL in storeHandler(). > >> > >> - The bug that free() in finishStoreInfo() will be fed with > >>garbage pointer when malloc for sinfo->valbuflen fails is > >>fixed. > > > > Thanks, merged. I also did some minor coding style cleanups. > > > > Tagging it Ready For Committer. I don't see any notable > > issues with the patch anymore. > > > > There is some potential for experimenting with more aggressive > > optimizations on dblink side, but I'd like to get a nod from > > a committer for libpq changes first. > > I tried to use this feature in pgsql_fdw patch, and found some small issues. > > - Typos > - mes -> msg > - funcion -> function > - overritten -> overwritten > - costom -> custom Fixed. > - What is the right (or recommended) way to prevent from throwing > exceptoin in row-processor callback function? When author should use > PG_TRY block to catch exception in the callback function? When it calls backend functions that can throw exceptions? As all handlers running in backend will want to convert data to Datums, that means "always wrap handler code in PG_TRY"? Although it seems we could allow exceptions, at least when we are speaking of Postgres backend, as the connection and result are internally consistent state when the handler is called, and the partial PGresult is stored under PGconn->result. Even the connection is in consistent state, as the row packet is fully processed. So we have 3 variants, all work, but which one do we want to support? 1) Exceptions are not allowed. 2) Exceptions are allowed, but when it happens, user must call PQfinish() next, to avoid processing incoming data from potentially invalid state. 3) Exceptions are allowed, and further row processing can continue with PQisBusy() / PQgetResult() 3.1) The problematic row is retried. (Current behaviour) 3.2) The problematic row is skipped. No clue if that is ok for handler written in C++, I have no idea whether you can throw C++ exception when part of the stack contains raw C calls. > - It would be better to describe how to determine whether a column > result is NULL should be described clearly. Perhaps result value is > NULL when PGrowValue.len is less than 0, right? Eh, seems it's documented everywhere except in sgml doc. Fixed. [ Is it better to document that it is "-1" or "< 0"? ] Also I removed one remaining dynamic stack array in dblink.c Current state of patch attached. -- marko *** a/src/interfaces/libpq/exports.txt --- b/src/interfaces/libpq/exports.txt *** *** 160,162 PQconnectStartParams 157 --- 160,164 PQping158 PQpingParams 159 PQlibVersion 160 + PQsetRowProcessor 161 + PQsetRowProcessorErrMsg 162 *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** *** 2693,2698 makeEmptyPGconn(void) --- 2693,2701 conn->wait_ssl_try = false; #endif + /* set default row processor */ + PQsetRowProcessor(conn, NULL, NULL); + /* * We try to send at least 8K at a time, which is the usual size of pipe * buffers on Unix systems. That way, when we are sending a large amount *** *** 2711,2718 makeEmptyPGconn(void) --- 2714,2726 initPQExpBuffer(&conn->errorMessage); initPQExpBuffer(&conn->workBuffer); + /* set up initial row buffer */ + conn->rowBufLen = 32; + conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue)); + if (conn->inBuffer == NULL || conn->outBuffer == NULL || + conn->rowBuf == NULL || PQExpBufferBroken(&conn->errorMessage) || PQExpBufferBroken(&conn->workBuffer)) { *** *** 2814,2819 freePGconn(PGconn *conn) --- 2822,2829 free(conn->inBuffer); if (conn->outBuffer) free(conn->outBuffer); + if (conn->rowBuf) + free(conn->rowBuf); termPQExpBuffer(&conn->errorMessage); termPQExpBuffer(&conn->workBuffer); *** *** 5078,5080 PQregisterThreadLock(pgthreadlock_t newhandler) --- 5088,5091 return prev; } + *** a/src/interfaces/libpq/fe-exec.c --- b/src/interfaces/libpq/fe-exec.c *** *** 66,71 static PGresult *PQexecFinish(PGconn *conn); --- 66,72 static int PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target); static int check_field_number(const PGresult *res, int field_num); + static int pqAddRow(PGresult *res, void *param, PGrowValue *columns); /* *** *** 160,165 PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) --- 161,167 result->curBlock = NULL; result->curOffset = 0; result->spaceLeft = 0; +
Re: [HACKERS] semi-PoC: kNN-gist for cubes
On Tue, Feb 07, 2012 at 05:56:52AM -0800, David Fetter wrote: > On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote: > > I have a rough proof-of-concept for getting nearest-neighbor > > searches working with cubes. > > Please attach such patches to the email when posting them :) And here's a cleaned-up version of the patch that at least passes "make check." Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/contrib/cube/cube.c --- b/contrib/cube/cube.c *** *** 73,78 PG_FUNCTION_INFO_V1(g_cube_penalty); --- 73,79 PG_FUNCTION_INFO_V1(g_cube_picksplit); PG_FUNCTION_INFO_V1(g_cube_union); PG_FUNCTION_INFO_V1(g_cube_same); + PG_FUNCTION_INFO_V1(g_cube_distance); Datum g_cube_consistent(PG_FUNCTION_ARGS); Datum g_cube_compress(PG_FUNCTION_ARGS); *** *** 81,86 Datumg_cube_penalty(PG_FUNCTION_ARGS); --- 82,88 Datum g_cube_picksplit(PG_FUNCTION_ARGS); Datum g_cube_union(PG_FUNCTION_ARGS); Datum g_cube_same(PG_FUNCTION_ARGS); + Datum g_cube_distance(PG_FUNCTION_ARGS); /* ** B-tree support functions *** *** 649,654 g_cube_same(PG_FUNCTION_ARGS) --- 651,671 } /* + ** Distance method + */ + Datum + g_cube_distance(PG_FUNCTION_ARGS) + { + GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); + NDBOX *query = PG_GETARG_NDBOX(1); + double distance; + + distance = DatumGetFloat8(DirectFunctionCall2(cube_distance, entry->key, PointerGetDatum(query))); + + PG_RETURN_FLOAT8(distance); + } + + /* ** SUPPORT ROUTINES */ bool -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing special case OID generation
Excerpts from Simon Riggs's message of mar feb 07 10:46:09 -0300 2012: > Recent events have made me notice the OID handling. > > AFAICS, OIDs are just a sequence with a max value that fits in a uint. > > So ISTM that we should just strip out the OID handling code and just > have a system sequence defined like this > > CREATE SEQUENCE _pg_oid >MINVALUE 0 >MAXVALUE 4294967296 >CACHE 8192 >CYCLE; > > Which would then make it easier to have a sequence for each toast > table and a sequence for lobs. Yeah, I agree that having a single sequence to handle oid allocations on all toast tables across all databases is strange. I don't have evidence that this is a real scalability problem though. But theoretically at least it seems to me that there could sporadically be a problem if a table has a long string of allocated OIDs and the system OID generator wraps around to somewhere within that range to allocate a new one for that table. That could cause a long period of spinning to get a new value, thus high latency on that insert. (Now admittedly if the same were to happen with a non-shared sequence, it would have to spin just the same -- but it'd do so without having to grab a system-level lwlock each time.) Having one sequence for each toast table could be wasteful though. I mean, sequences are not the best use of shared buffer cache currently. If we could have more than one sequence data in a shared buffer page, things would be different. Not sure how serious this really is. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLockWaitUntilFree (was: Group commit, revised)
On Tue, Feb 7, 2012 at 3:48 AM, Heikki Linnakangas wrote: > On 03.02.2012 22:57, Robert Haas wrote: >> >> I think I recommended a bad name for this function. It's really >> LWLockAcquireOrWaitUntilFree. Can we rename that? > > Agreed, that's better. Although quite long. "LWLockAcquireOrWait" perhaps? Sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] semi-PoC: kNN-gist for cubes
On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote: > I have a rough proof-of-concept for getting nearest-neighbor > searches working with cubes. Please attach such patches to the email when posting them :) Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing special case OID generation
Recent events have made me notice the OID handling. AFAICS, OIDs are just a sequence with a max value that fits in a uint. So ISTM that we should just strip out the OID handling code and just have a system sequence defined like this CREATE SEQUENCE _pg_oid MINVALUE 0 MAXVALUE 4294967296 CACHE 8192 CYCLE; Which would then make it easier to have a sequence for each toast table and a sequence for lobs. Not sure its important now, but maybe it will reduce the size of the executable and avoid oid-specific bugs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Tue, Feb 7, 2012 at 7:06 PM, Magnus Hagander wrote: > On Tue, Feb 7, 2012 at 10:55, Heikki Linnakangas > wrote: >> On 07.02.2012 11:35, Magnus Hagander wrote: >>> >>> On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas >>> wrote: So, --statusint needs to be in milliseconds. And while we're at it, how Attached patch does that and fixes the problem caused under --disable-integer-datetimes. difficult would be to ask the server for the current value of replication_timeout, and set --statusint automatically based on that? Or perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that depending on a server setting, you need to pass an option in the client or it will just silently fail with no indication of what the problem is. >>> >>> >>> We can't really ask for it easily, since we're on a replication >>> connection. Unless we add that to the walsender grammar, but that >>> would make it impossible to use the receivexlog stuff against a 9.1 >>> server (which I think still works, though I haven't tested it in a >>> while). >> >> >> You could put a version-check there, and only send the command if connected >> to a 9.2 server. > > True.. > > >>> Do we have a facility to make it a GUC_REPORT but only for walsender >>> connections? >> >> >> There's no such facility at the moment. >> >> >>> It seems like a very unnecessary thing to have it sent out over every >>> single connection, since it would only be useful in a very small >>> subset of them. >> >> >> True, and conversely, many of the current GUC_REPORT settings don't apply to >> replication clients at all. Like standard_conforming_strings and DateStyle. > > Right. But it's less important there, since the replication > connections will in any reasonable configuration be only a tiny part > of the connections. > > >> I think we need another flag for settings that should be sent to replication >> clients. GUC_REPORT_REPLICATION? Some settings would have both GUC_REPORT >> and GUC_REPORT_REPLICATION, while others would have only one of them. > > Yup, seems like the cleanest solution. Agreed. But when replication_timeout is changed by SIGHUP in the server, there would be a delay before pg_receivexlog receives the parameter change packet and changes the standby message interval based on the new value of replication_timeout. Which might cause unexpected replication timeout. So the user-settable interval (i.e., --statusint) is still useful for people who want to avoid such fragility, even if we will support the auto- tuning of the interval. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/doc/src/sgml/ref/pg_basebackup.sgml --- b/doc/src/sgml/ref/pg_basebackup.sgml *** *** 335,341 PostgreSQL documentation --statusint=interval ! Specifies the number of seconds between status packets sent back to the server. This is required when streaming the transaction log (using --xlog=stream) if replication timeout is configured on the server, and allows for easier monitoring. The default value is --- 335,341 --statusint=interval ! Specifies the number of milliseconds between status packets sent back to the server. This is required when streaming the transaction log (using --xlog=stream) if replication timeout is configured on the server, and allows for easier monitoring. The default value is *** a/doc/src/sgml/ref/pg_receivexlog.sgml --- b/doc/src/sgml/ref/pg_receivexlog.sgml *** *** 108,114 PostgreSQL documentation --statusint=interval ! Specifies the number of seconds between status packets sent back to the server. This is required if replication timeout is configured on the server, and allows for easier monitoring. The default value is 10 seconds. --- 108,114 --statusint=interval ! Specifies the number of milliseconds between status packets sent back to the server. This is required if replication timeout is configured on the server, and allows for easier monitoring. The default value is 10 seconds. *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** *** 46,52 int compresslevel = 0; bool includewal = false; bool streamwal = false; bool fastcheckpoint = false; ! int standby_message_timeout = 10; /* 10 sec = default */ /* Progress counters */ static uint64 totalsize; --- 46,52 bool includewal = false; bool streamwal = false; bool fastcheckpoint = false; ! int standby_message_timeout = 10 * 1000; /* 10 sec = default */ /* Progress counters */ static uint64 totalsize; *** *** 118,124 usage(void) printf(_(" --help show this help
[HACKERS] patch for preventing the specification of conflicting transaction read/write options
Hi, This is regarding the TODO item: "Prevent the specification of conflicting transaction read/write options" listed at: http://wiki.postgresql.org/wiki/Todo The issue is : SET TRANSACTION read only read write read only; The fix involved iteration over transaction_mode_list and checking for duplicate entries. The patch is based on suggestions mentioned in message: http://archives.postgresql.org/pgsql-hackers/2009-01/msg00692.php As per this, the patch does not throw any error for the first test case mentioned above. It throws error only in case of conflicting modes. For ex: postgres=# SET TRANSACTION read only read only; SET postgres=# SET TRANSACTION read only read write; ERROR: conflicting options LINE 1: SET TRANSACTION read only read write; ^ Below are basic unit test logs: postgres=# SET TRANSACTION ISOLATION LEVEL serializable ISOLATION LEVEL serializable; SET postgres=# SET TRANSACTION ISOLATION LEVEL serializable ISOLATION LEVEL repeatable read; ERROR: conflicting options LINE 1: SET TRANSACTION ISOLATION LEVEL serializable ISOLATION LEVEL... postgres=# SET TRANSACTION DEFERRABLE DEFERRABLE; SET postgres=# SET TRANSACTION DEFERRABLE NOT DEFERRABLE; ERROR: conflicting options LINE 1: SET TRANSACTION DEFERRABLE NOT DEFERRABLE; ^ postgres=# START TRANSACTION read only, read only; START TRANSACTION postgres=# rollback; ROLLBACK postgres=# START TRANSACTION read only, read write; ERROR: conflicting options LINE 1: START TRANSACTION read only, read write; postgres=# rollback; ROLLBACK ^ postgres=# BEGIN TRANSACTION ISOLATION LEVEL serializable, ISOLATION LEVEL serializable; BEGIN postgres=# rollback; ROLLBACK postgres=# BEGIN TRANSACTION ISOLATION LEVEL serializable, ISOLATION LEVEL repeatable read; ERROR: conflicting options LINE 1: BEGIN TRANSACTION ISOLATION LEVEL serializable, ISOLATION LE... ^ Please pass on any inputs on the patch. Regards, Chetan -- EnterpriseDB Corporation The Enterprise PostgreSQL Company Website: www.enterprisedb.com EnterpriseDB Blog : http://blogs.enterprisedb.com Follow us on Twitter : http://www.twitter.com/enterprisedb diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 62fde67..28c987f 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -146,6 +146,9 @@ static void processCASbits(int cas_bits, int location, const char *constrType, bool *deferrable, bool *initdeferred, bool *not_valid, core_yyscan_t yyscanner); +static void +check_trans_mode(List *list, DefElem *elem, core_yyscan_t yyscanner); + %} %pure-parser @@ -7510,9 +7513,16 @@ transaction_mode_list: transaction_mode_item { $$ = list_make1($1); } | transaction_mode_list ',' transaction_mode_item - { $$ = lappend($1, $3); } + { + check_trans_mode((List *)$1, (DefElem *)$3, yyscanner); + $$ = lappend($1, $3); + } + | transaction_mode_list transaction_mode_item - { $$ = lappend($1, $2); } + { + check_trans_mode((List *)$1, (DefElem *)$2, yyscanner); + $$ = lappend($1, $2); + } ; transaction_mode_list_or_empty: @@ -13215,6 +13225,57 @@ parser_init(base_yy_extra_type *yyext) } /* + * checks for conflicting transaction modes by looking up current + * element in the given list. + */ +static void +check_trans_mode(List *list, DefElem *elem, core_yyscan_t yyscanner) +{ + ListCell *lc; + A_Const *elem_arg; + + elem_arg =(A_Const *) elem->arg; + + /* cannot specify conflicting value for transaction mode. */ + foreach (lc, list) + { + DefElem *next; + A_Const *arg; + + next = (DefElem*)lfirst (lc); + arg = (A_Const *)next->arg; + + /* check for duplicate value in remaining list */ + if (strcmp(elem->defname, next->defname) == 0) + { + /* + * isolation level values are stored + * as string whereas other modes are + * stored as integer values. + */ + if (strcmp(elem->defname,"transaction_isolation") == 0) + { +/* check for conflicting values */ +if (strcmp(elem_arg->val.val.str,arg->val.val.str)!= 0) + ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting options"), + parser_errposition(arg->location))); + } + else + { +/* check for conflicting values */ +if (elem_arg->val.val.ival != arg->val.val.ival) + ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting options"), + parser_errposition(arg->location))); + } + } + } +} + +/* * Must undefine this stuff before including scan.c, since it has different * definitions for these macros. */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Thu, Jan 26, 2012 at 10:00 PM, Dimitri Fontaine wrote: > Dimitri Fontaine writes: >>> Really I think there is not any single point where you can put the >>> command-trigger hook and be done. In almost every case, the right >>> place is going to be buried somewhere within the execution of the >>> command. >> >> I'm finally realizing it. I already introduced a structure called >> CommandContextData to keep track of the current command elements we want >> to pass down to command triggers, I think we're saying that this should >> be a static variable that commands will need to be editing. > > In fact passing it as an argument to the command trigger API is much > simpler and done in the attached. I'm having problems here with my > install and not enough time this week (you don't speak English if you > don't use understatements here and there, right?) so please expect a > one-step-further patch to show the integration concept, not a ready for > commit one just yet. > > Next steps are about adding support for more commands, and now that we > have settled on a simpler integration that will be easy. The parameters > sent to the trigger procedure are now the command tag, the main object > Oid, its schema name and its object name. Only the command tag will > never be NULL, all the other columns could be left out when calling an > ANY COMMAND trigger, or a command on a schemaless object. > > Note: the per-command integration means the Oid is generally available, > so let's just export it. > > An ack about the way it's now implemented would be awesome, and we could > begin to start about which set of command exactly we want supported from > the get go (default to all of them of course, but well, I don't think > that's necessarily the best use of our time given our command list). Looks good so far. A user centric review of this patch... Would like a way to say ALL COMMANDS rather than write out list of supported commands. That list is presumably subject to change, so not having this could result in bugs in later code. The example given in the docs has no explanation. More examples and explanation please. Would specifically be interested in a command trigger which simply prevents all commands, which has many uses and is simple enough to be in docs. Why do we exclude SEQUENCEs? That could well be a problem for intended users Please explain/add to docs/comments why ALTER VIEW is not supported? Why not other commands?? Not a problem, but users will ask, so we should write down the answer Please document why pg_cmdtriggers is a separate table and not enhancement of pg_triggers? Thanks -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for implementing SPI_gettypemod()
On Tue, Feb 7, 2012 at 5:44 PM, Kevin Grittner wrote: > "Kevin Grittner" wrote: > > > moved this to "Replication and Recovery" > > Oh, that was a different patch -- I didn't see yours. > > (It's early, and the caffeine isn't working yet.) > > Anyway, you should have plenty of options now. > > -Kevin > > Thanks Kevin :) -- EnterpriseDB Corporation The Enterprise PostgreSQL Company Website: www.enterprisedb.com EnterpriseDB Blog : http://blogs.enterprisedb.com Follow us on Twitter : http://www.twitter.com/enterprisedb
Re: [HACKERS] patch for implementing SPI_gettypemod()
"Kevin Grittner" wrote: > moved this to "Replication and Recovery" Oh, that was a different patch -- I didn't see yours. (It's early, and the caffeine isn't working yet.) Anyway, you should have plenty of options now. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for implementing SPI_gettypemod()
Chetan Suttraway wrote: > Robert Haas wrote: >> Please add this to the next CommitFest: >> >> https://commitfest.postgresql.org/action/commitfest_view/open > At the given link, I am able to choose only "System administration" > under commitfest topic. > I think there has to be "server features" or "Miscellaneous". I created all the topics from the last CF, in the same order, and moved this to "Replication and Recovery" -- it is about the ability to recover if the OS crashes around the time initdb completes, right? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] controlling the location of server-side SSL files
On tis, 2012-01-24 at 22:05 +0200, Peter Eisentraut wrote: > > > One thing that is perhaps worth thinking about: Currently, we just > > > ignore missing root.crt and root.crl files. With this patch, we still > > > do this, even if the user has given a specific nondefault location. > > > That seems a bit odd, but I can't think of a simple way to do it better. > > > > There's a review in the CF app for this finding only minor issues, so > > I'm marking this patch therein as "Ready for Committer". > > OK, no one had any concerns about the missing file behavior I > described above? If not, then I'll commit it soon. I'm still worried about this. If we ignore a missing root.crt, then the effect is that authentication and certificate verification might fail, which would be annoying, but you'd notice it soon enough. But if we ignore a missing root.crl, we are creating a security hole. My best idea at the moment is that we should set these parameters to empty by default, and make users point them to existing files if they want to use that functionality. Comments? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_basebackup -x stream from the standby gets stuck
Hi, http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ > =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h > 127.0.0.1 -p 5921 -U replication > xlog start point: 2/AC4E2600 > pg_basebackup: starting background WAL receiver > 692447/692447 kB (100%), 1/1 tablespace > xlog end point: 2/AC4E2600 > pg_basebackup: waiting for background process to finish streaming... > pg_basebackup: base backup completed > > real3m56.237s > user0m0.224s > sys 0m0.936s > > (time is long because this is only test database with no traffic, so I had to > make some inserts for it to finish) The above article points out the problem of pg_basebackup from the standby: when "-x stream" is specified, pg_basebackup from the standby gets stuck if there is no traffic in the database. When "-x stream" is specified, pg_basebackup forks the background process for receiving WAL records during backup, takes an online backup and waits for the background process to end. The forked background process keeps receiving WAL records, and whenever it reaches end of WAL file, it checks whether it has already received all WAL files required for the backup, and exits if yes. Which means that at least one WAL segment switch is required for pg_basebackup with "-x stream" option to end. In the backup from the master, WAL file switch always occurs at both start and end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the above logic works fine even if there is no traffic. OTOH, in the backup from the standby, while there is no traffic, WAL file switch is not performed at all. So in that case, there is no chance that the background process reaches end of WAL file, check whether all required WAL arrives and exit. At the end, pg_basebackup gets stuck. To fix the problem, I'd propose to change the background process so that it checks whether all required WAL has arrived, every time data is received, even if end of WAL file is not reached. Patch attached. Comments? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** *** 78,84 static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum); static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); static void BaseBackup(void); ! static bool segment_callback(XLogRecPtr segendpos, uint32 timeline); #ifdef HAVE_LIBZ static const char * --- 78,84 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); static void BaseBackup(void); ! static bool reached_end_position(XLogRecPtr segendpos, uint32 timeline); #ifdef HAVE_LIBZ static const char * *** *** 129,136 usage(void) /* ! * Called in the background process whenever a complete segment of WAL ! * has been received. * On Unix, we check to see if there is any data on our pipe * (which would mean we have a stop position), and if it is, check if * it is time to stop. --- 129,137 /* ! * Called in the background process every time data is received. ! * Also called when the streaming stops to check whether ! * the current log segment can be treated as a complete one. * On Unix, we check to see if there is any data on our pipe * (which would mean we have a stop position), and if it is, check if * it is time to stop. *** *** 138,144 usage(void) * time to stop. */ static bool ! segment_callback(XLogRecPtr segendpos, uint32 timeline) { if (!has_xlogendptr) { --- 139,145 * time to stop. */ static bool ! reached_end_position(XLogRecPtr segendpos, uint32 timeline) { if (!has_xlogendptr) { *** *** 231,237 LogStreamerMain(logstreamer_param * param) { if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline, param->sysidentifier, param->xlogdir, ! segment_callback, NULL, standby_message_timeout)) /* * Any errors will already have been reported in the function process, --- 232,238 { if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline, param->sysidentifier, param->xlogdir, ! reached_end_position, reached_end_position, standby_message_timeout)) /* * Any errors will already have been reported in the function process, *** a/src/bin/pg_basebackup/pg_receivexlog.c --- b/src/bin/pg_basebackup/pg_receivexlog.c *** *** 71,77 usage(void) static bool segment_callback(XLogRecPtr segendpos, uint32 timeline) { ! if (verbose) fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"), progname, segendpos.xlogid, segendpos.xrecoff, timeline); --- 71,77 static bool segment_callback(XLogRecPtr segendpos, uint32 timeline) { ! if (verbose && segendpos.xrecoff % XLOG_SEG_SIZE == 0) fprin
Re: [HACKERS] Assertion failure in AtCleanup_Portals
On Tue, Feb 7, 2012 at 3:03 AM, Tom Lane wrote: > Pavan Deolasee writes: >> If I run the following sequence of commands, I get an assertion >> failure in current HEAD. > >> postgres=# BEGIN; >> BEGIN >> postgres=# SELECT 1/0; >> ERROR: division by zero >> postgres=# ROLLBACK TO A; >> ERROR: no such savepoint >> postgres=# \q > >> The process fails when the session is closed and aborted transaction >> gets cleaned at the proc_exit time. The stack trace is as below. > > Hmm. It works fine if you issue an actual ROLLBACK command there, > so my gut reaction is that AbortOutOfAnyTransaction isn't sufficiently > duplicating the full-fledged ROLLBACK code path. No time to dig further > right now though. > It works OK for a ROLLBACK command because we create a new unnamed portal for the ROLLBACK command, silently dropping the old one if it already exists. Since the ROLLBACK command then runs successfully, we don't see the same assertion. Would it be safe to drop FAILED unnamed portals during AbortOutAnyTransaction ? May be it is if we can do that before creating a new portal for ROLLBACK command itself. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql NUL record and field separator
On tor, 2012-01-26 at 19:00 +0530, Abhijit Menon-Sen wrote: > At issue are (at least) these three lines from print_unaligned_text in > src/bin/psql/print.c: > > 358 /* the last record needs to be concluded with a newline > */ > 359 if (need_recordsep) > 360 fputc('\n', fout); > > Perhaps the right thing to do would be to change this to output \0 if > --record-separator-zero was used (but leave it at \n otherwise)? That > is what my second attached patch does: > > $ bin/psql --record-separator-zero --field-separator-zero -At -c > 'select 1,2 union select 3,4'|xargs -0 echo > 1 2 3 4 > > Thoughts? > > > I think the most common use of this would be to set the record > > separator from the command line, so we could use a short option > > such as -0 or -z for that. > > I agree. The current option names are very unwieldy to type. > I have incorporated your two patches and added short options. Updated patch attached. This made me wonder, however. The existing -F and -R options set the record *separator*. The new options, however, set the record *terminator*. This is the small distinction that you had discovered. Should we rename the options and/or add that to the documentation, or is the new behavior obvious and any new terminology would be too confusing? diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index a9b1ed2..55aa5f2 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -482,6 +482,27 @@ PostgreSQL documentation + + -z + --field-separator-zero + + + Set the field separator for unaligned output to a zero byte. + + + + + + -0 + --record-separator-zero + + + Set the record separator for unaligned output to a zero byte. This is + useful for interfacing, for example, with xargs -0. + + + + -1 --single-transaction @@ -1909,6 +1930,16 @@ lo_import 152801 + fieldsep_zero + + + Sets the field separator to use in unaligned output format to a zero + byte. + + + + + footer @@ -2078,6 +2109,16 @@ lo_import 152801 + recordsep_zero + + + Sets the record separator to use in unaligned output format to a zero + byte. + + + + + tableattr (or T) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index ab809ec..8421ad0 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -2272,11 +2272,26 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) { if (value) { - free(popt->topt.fieldSep); - popt->topt.fieldSep = pg_strdup(value); + free(popt->topt.fieldSep.separator); + popt->topt.fieldSep.separator = pg_strdup(value); + popt->topt.fieldSep.separator_zero = false; } if (!quiet) - printf(_("Field separator is \"%s\".\n"), popt->topt.fieldSep); + { + if (popt->topt.fieldSep.separator_zero) +printf(_("Field separator is zero byte.\n")); + else +printf(_("Field separator is \"%s\".\n"), popt->topt.fieldSep.separator); + } + } + + else if (strcmp(param, "fieldsep_zero") == 0) + { + free(popt->topt.fieldSep.separator); + popt->topt.fieldSep.separator = NULL; + popt->topt.fieldSep.separator_zero = true; + if (!quiet) + printf(_("Field separator is zero byte.\n")); } /* record separator for unaligned text */ @@ -2284,18 +2299,30 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) { if (value) { - free(popt->topt.recordSep); - popt->topt.recordSep = pg_strdup(value); + free(popt->topt.recordSep.separator); + popt->topt.recordSep.separator = pg_strdup(value); + popt->topt.recordSep.separator_zero = false; } if (!quiet) { - if (strcmp(popt->topt.recordSep, "\n") == 0) + if (popt->topt.recordSep.separator_zero) +printf(_("Record separator is zero byte.\n")); + else if (strcmp(popt->topt.recordSep.separator, "\n") == 0) printf(_("Record separator is .")); else -printf(_("Record separator is \"%s\".\n"), popt->topt.recordSep); +printf(_("Record separator is \"%s\".\n"), popt->topt.recordSep.separator); } } + else if (strcmp(param, "recordsep_zero") == 0) + { + free(popt->topt.recordSep.separator); + popt->topt.recordSep.separator = NULL; + popt->topt.recordSep.separator_zero = true; + if (!quiet) + printf(_("Record separator is zero byte.\n")); + } + /* toggle between full and tuples-only format */ else if (strcmp(param, "t") == 0 || strcmp(param, "tuples_only") == 0) { diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 172fd0c..eff0ea5 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -123,6
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
(2012/02/02 23:30), Marko Kreen wrote: > On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote: >> Hello, This is new version of dblink.c >> >> - Memory is properly freed when realloc returns NULL in storeHandler(). >> >> - The bug that free() in finishStoreInfo() will be fed with >>garbage pointer when malloc for sinfo->valbuflen fails is >>fixed. > > Thanks, merged. I also did some minor coding style cleanups. > > Tagging it Ready For Committer. I don't see any notable > issues with the patch anymore. > > There is some potential for experimenting with more aggressive > optimizations on dblink side, but I'd like to get a nod from > a committer for libpq changes first. I tried to use this feature in pgsql_fdw patch, and found some small issues. - Typos - mes -> msg - funcion -> function - overritten -> overwritten - costom -> custom - What is the right (or recommended) way to prevent from throwing exceptoin in row-processor callback function? When author should use PG_TRY block to catch exception in the callback function? - It would be better to describe how to determine whether a column result is NULL should be described clearly. Perhaps result value is NULL when PGrowValue.len is less than 0, right? Regards, -- Shigeru Hanada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Tue, Feb 7, 2012 at 10:55, Heikki Linnakangas wrote: > On 07.02.2012 11:35, Magnus Hagander wrote: >> >> On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas >> wrote: >>> >>> So, --statusint needs to be in milliseconds. And while we're at it, how >>> >>> difficult would be to ask the server for the current value of >>> replication_timeout, and set --statusint automatically based on that? Or >>> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that >>> depending on a server setting, you need to pass an option in the client >>> or >>> it will just silently fail with no indication of what the problem is. >> >> >> We can't really ask for it easily, since we're on a replication >> connection. Unless we add that to the walsender grammar, but that >> would make it impossible to use the receivexlog stuff against a 9.1 >> server (which I think still works, though I haven't tested it in a >> while). > > > You could put a version-check there, and only send the command if connected > to a 9.2 server. True.. >> Do we have a facility to make it a GUC_REPORT but only for walsender >> connections? > > > There's no such facility at the moment. > > >> It seems like a very unnecessary thing to have it sent out over every >> single connection, since it would only be useful in a very small >> subset of them. > > > True, and conversely, many of the current GUC_REPORT settings don't apply to > replication clients at all. Like standard_conforming_strings and DateStyle. Right. But it's less important there, since the replication connections will in any reasonable configuration be only a tiny part of the connections. > I think we need another flag for settings that should be sent to replication > clients. GUC_REPORT_REPLICATION? Some settings would have both GUC_REPORT > and GUC_REPORT_REPLICATION, while others would have only one of them. Yup, seems like the cleanest solution. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On 07.02.2012 11:35, Magnus Hagander wrote: On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas wrote: So, --statusint needs to be in milliseconds. And while we're at it, how difficult would be to ask the server for the current value of replication_timeout, and set --statusint automatically based on that? Or perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that depending on a server setting, you need to pass an option in the client or it will just silently fail with no indication of what the problem is. We can't really ask for it easily, since we're on a replication connection. Unless we add that to the walsender grammar, but that would make it impossible to use the receivexlog stuff against a 9.1 server (which I think still works, though I haven't tested it in a while). You could put a version-check there, and only send the command if connected to a 9.2 server. Do we have a facility to make it a GUC_REPORT but only for walsender connections? There's no such facility at the moment. It seems like a very unnecessary thing to have it sent out over every single connection, since it would only be useful in a very small subset of them. True, and conversely, many of the current GUC_REPORT settings don't apply to replication clients at all. Like standard_conforming_strings and DateStyle. I think we need another flag for settings that should be sent to replication clients. GUC_REPORT_REPLICATION? Some settings would have both GUC_REPORT and GUC_REPORT_REPLICATION, while others would have only one of them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas wrote: > On 07.02.2012 09:03, Fujii Masao wrote: >> >> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao wrote: >>> >>> When I compiled HEAD with --disable-integer-datetimes and tested >>> >>> pg_receivexlog, I encountered unexpected replication timeout. As >>> far as I read the pg_receivexlog code, the cause of this problem is >>> that pg_receivexlog handles the standby message timeout incorrectly >>> in --disable-integer-datetimes. The attached patch fixes this problem. >>> Comments? >> >> >> receivelog.c >> --- >> timeout.tv_sec = last_status + standby_message_timeout - now - 1; >> if (timeout.tv_sec<= 0) >> --- >> >> Umm.. the above code also handles the timestamp incorrectly. ISTM that the >> root cause of these problems is that receivelog.c uses TimestampTz. > > > Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles > float timestamps correctly, the caller just assigns the result to a int64 > variable, assuming --enable-integer-datetimes. Ugh. Indeed. >> What about changing receivelog.c so that it uses time_t instead of >> TimestampTz? Which would make the code simpler, I think. > > > Hmm, that would reduce granularity to seconds. The --statusint option is > given in seconds, but it would be good to have more precision in the > calculations to avoid rounding errors. > > But actually, if the purpose of the --statusint option is to avoid > disconnection because of exceeding the server's replication_timeout, one > second granularity just isn't enough to be begin with. replication_timeout > is given in milliseconds, so if you set replication_timeout=900ms in the > server, there is no way to make pg_basebackup/pg_receivexlog to reply in > time. > > So, --statusint needs to be in milliseconds. And while we're at it, how > difficult would be to ask the server for the current value of > replication_timeout, and set --statusint automatically based on that? Or > perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that > depending on a server setting, you need to pass an option in the client or > it will just silently fail with no indication of what the problem is. We can't really ask for it easily, since we're on a replication connection. Unless we add that to the walsender grammar, but that would make it impossible to use the receivexlog stuff against a 9.1 server (which I think still works, though I haven't tested it in a while). Do we have a facility to make it a GUC_REPORT but only for walsender connections? It seems like a very unnecessary thing to have it sent out over every single connection, since it would only be useful in a very small subset of them. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect handling of the timeout in pg_receivexlog
On 07.02.2012 09:03, Fujii Masao wrote: On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao wrote: When I compiled HEAD with --disable-integer-datetimes and tested pg_receivexlog, I encountered unexpected replication timeout. As far as I read the pg_receivexlog code, the cause of this problem is that pg_receivexlog handles the standby message timeout incorrectly in --disable-integer-datetimes. The attached patch fixes this problem. Comments? receivelog.c --- timeout.tv_sec = last_status + standby_message_timeout - now - 1; if (timeout.tv_sec<= 0) --- Umm.. the above code also handles the timestamp incorrectly. ISTM that the root cause of these problems is that receivelog.c uses TimestampTz. Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles float timestamps correctly, the caller just assigns the result to a int64 variable, assuming --enable-integer-datetimes. What about changing receivelog.c so that it uses time_t instead of TimestampTz? Which would make the code simpler, I think. Hmm, that would reduce granularity to seconds. The --statusint option is given in seconds, but it would be good to have more precision in the calculations to avoid rounding errors. But actually, if the purpose of the --statusint option is to avoid disconnection because of exceeding the server's replication_timeout, one second granularity just isn't enough to be begin with. replication_timeout is given in milliseconds, so if you set replication_timeout=900ms in the server, there is no way to make pg_basebackup/pg_receivexlog to reply in time. So, --statusint needs to be in milliseconds. And while we're at it, how difficult would be to ask the server for the current value of replication_timeout, and set --statusint automatically based on that? Or perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that depending on a server setting, you need to pass an option in the client or it will just silently fail with no indication of what the problem is. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LWLockWaitUntilFree (was: Group commit, revised)
On 03.02.2012 22:57, Robert Haas wrote: I think I recommended a bad name for this function. It's really LWLockAcquireOrWaitUntilFree. Can we rename that? Agreed, that's better. Although quite long. "LWLockAcquireOrWait" perhaps? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On 04.02.2012 07:24, Jeff Janes wrote: Is it safe to assume that, under "#ifdef LWLOCK_STATS", a call to LWLockAcquire will always precede any calls to LWLockWaitUntilFree when a new process is started, to calloc the stats assays? > I guess it is right now, because the only user is WALWrite, which would never be acquired before WALInsert is at least once. But this doesn't seem very future proof. Agreed, we can't count on that. There's no clear single point after a process startup where the first lwlock is acquired. Out of curiosity, I added an elog(LOG, ...) to that initialization code, to log which lwlock is acquired first in a process. It depends on the process and circumstances - here's the list I got: BufFreeListLock ShmemIndexLock XidGenLock ProcArrayLock BgWriterCommLock AutoVacuumLock And that's probably not all, I bet you would acquire different locks first with recovery, streaming replication etc.. I didn't test those. Anyway, I added the initialization to LWLockWaitUntilFree now. Thanks! I guess the same complain could be logged against LWLockConditionalAcquire. LWLockConditionalAcquire doesn't update the stats, so it's ok. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers