Re: [HACKERS] docs: WITH queries and VALUES
On tor, 2012-04-12 at 11:59 +0200, Magnus Hagander wrote: > The SELECT manpage has: > > and with_query is: > > with_query_name [ ( column_name [, ...] ) ] AS ( select | insert | > update | delete ) > > > Should that list that you can use values as well? Or is it something > we generally consider "wherever select works you can use values"? Added. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
On Fri, Apr 13, 2012 at 10:43 PM, Pavel Stehule wrote: >> Yeah. I think it would be a good idea for UPDATE and DELETE to expose >> a LIMIT option, but I can't really see the virtue in making that >> functionality available only through SPI. > > I don't agree - LIMIT after UPDATE or DELETE has no sense. Clean > solution should be based on using updateable CTE. It has a lot of sense. Without it, it's very difficult to do logical replication on a table with no primary key. (Whether or not people should create such tables in the first place is, of course, beside the point.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 10:01 PM, Tom Lane wrote: > Greg Stark writes: >> On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas wrote: >>> That's probably true, but I'm not sure it's worth worrying about - >>> one-in-four-billion is a pretty small probability. > >> Is this not subject to the birthday paradox? If you have a given hash >> you're worried about a collision with then you have a >> one-in-four-billion chance. But if you have a collection of hashes and >> you're worried about any collisions then it only takes about 64k >> before there's likely a collision. > > ... so, if pg_stat_statements.max were set as high as 64k, you would > need to worry. Well... at 64k, you'd be very likely to have a collision. But the whole birthday paradox thing means that there's a non-trivial collision probability even at lower numbers of entries. Seems like maybe we ought to be using 64 bits here... > Realistically, I'm more worried about collisions due to inadequacies in > the jumble calculation logic (Peter already pointed out some risk > factors in that regard). ...especially if collisions are even more frequent than random chance would suggest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 14 April 2012 03:01, Tom Lane wrote: > Realistically, I'm more worried about collisions due to inadequacies in > the jumble calculation logic (Peter already pointed out some risk > factors in that regard). It's important to have a sense of proportion about the problem. The worst thing that a collision can do is lead the DBA on a bit of a wild goose chase. Importantly, collisions across databases and users are impossible. I've always taken the view that aggregating query statistics is a lossy process, and these kinds of problems seem like a more than acceptable price to pay for low-overhead dynamic query statistics . -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
> > Yeah. I think it would be a good idea for UPDATE and DELETE to expose > a LIMIT option, but I can't really see the virtue in making that > functionality available only through SPI. > I don't agree - LIMIT after UPDATE or DELETE has no sense. Clean solution should be based on using updateable CTE. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Greg Stark writes: > On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas wrote: >> That's probably true, but I'm not sure it's worth worrying about - >> one-in-four-billion is a pretty small probability. > Is this not subject to the birthday paradox? If you have a given hash > you're worried about a collision with then you have a > one-in-four-billion chance. But if you have a collection of hashes and > you're worried about any collisions then it only takes about 64k > before there's likely a collision. ... so, if pg_stat_statements.max were set as high as 64k, you would need to worry. Realistically, I'm more worried about collisions due to inadequacies in the jumble calculation logic (Peter already pointed out some risk factors in that regard). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 8:15 PM, Robert Haas wrote: > That's probably true, but I'm not sure it's worth worrying about - > one-in-four-billion is a pretty small probability. Is this not subject to the birthday paradox? If you have a given hash you're worried about a collision with then you have a one-in-four-billion chance. But if you have a collection of hashes and you're worried about any collisions then it only takes about 64k before there's likely a collision. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command counter increment vs updating an active snapshot
Ozgun Erdogan writes: > (1) What's the difference between advancing the command counter and > updating an active snapshot? For example, I see that DefineRelation() > increments the command counter, but explain.c / copy.c explicitly > calls UpdateActiveSnapshotCommandId(). Is that because the latter call > intends to make its changes visible to other concurrent processes? No, that's all local. CommandCounterIncrement automatically propagates the new command counter into a couple of "standard" snapshots that are meant to be always up-to-date, but if you want a stacked ActiveSnapshot to follow suit you need to call UpdateActiveSnapshotCommandId. > (2) The following change in pquery.c asserts that, if more than one > utility statement exists in portal statements, they all have to be > Notify statements. That's because the only way that can happen is expansion of a rule, and the only type of utility command allowed in rules is Notify. > When I modify the code so that one statement in portal->stmts gets > translated into four separate statements that all depend on each other > (one planned, two utility, and another planned statement) and remove > the assertion, all four statements still run fine. > Looking into the code, I understand this isn't the expected behavior > in terms of snapshot handling. In what scenarios can I expect our code > to break? [ shrug... ] You expect an answer to that when you haven't shown us the code? But in general I'd be really wary of putting most utility statements into a portal along with DML. The kindest thing to be said about that is that it's utterly untested. Notify is pretty safe because it doesn't affect the state of the database in any way that DML statements would care about; this isn't the case for most others. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #6572: The example of SPI_execute is bogus
On Thu, Apr 5, 2012 at 2:39 AM, Hitoshi Harada wrote: > On Wed, Apr 4, 2012 at 8:00 AM, Tom Lane wrote: >> umi.tan...@gmail.com writes: >>> http://www.postgresql.org/docs/9.1/static/spi-spi-execute.html >> >>> === >>> SPI_execute("INSERT INTO foo SELECT * FROM bar", false, 5); >>> will allow at most 5 rows to be inserted into the table. >>> === >> >>> This seems not true unless I'm missing something. >> >> Hmm ... that did work as described, until we broke it :-(. This is an >> oversight in the 9.0 changes that added separate ModifyTuple nodes to >> plan trees. ModifyTuple doesn't return after each updated row, unless >> there's a RETURNING clause; which means that the current_tuple_count >> check logic in ExecutePlan() no longer stops execution as intended. >> >> Given the lack of complaints since 9.0, maybe we should not fix this >> but just redefine the new behavior as being correct? But it seems >> mighty inconsistent that the tuple limit would apply if you have >> RETURNING but not when you don't. In any case, the ramifications >> are wider than one example in the SPI docs. >> >> Thoughts? > > To be honest, I was surprised when I found tcount parameter is said to > be applied to even INSERT. I believe people think that parameter is > to limit memory consumption when returning tuples thus it'd be applied > for only SELECT or DML with RETURNING. So I'm +1 for non-fix but > redefine the behavior. Who wants to limit the number of rows > processed inside the backend, from SPI? Yeah. I think it would be a good idea for UPDATE and DELETE to expose a LIMIT option, but I can't really see the virtue in making that functionality available only through SPI. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Jim Nasby writes: > I agree that ms is on it's way out... and frankly it wouldn't surprise me if > at some point we actually had need of ns resolution. > Given that, I don't think ms or us definitions make sense... float8 seconds > seams much more logical to me. Well, the important point is that it be float8, so that fractions of whatever units you choose can be represented. I do not feel a strong need to change the units in all the existing pg_stat_ columns from msec to sec; that strikes me as just breaking things to little gain. If the majority of them were in sec then I'd want to converge on that, but since the majority are in msec it seems like the path of least breakage is to converge on that. > Though, if we're going to end up seriously breaking things anyway, > perhaps it would make sense to switch everything over to interval... I > realize that there's more overhead there, but I don't think selecting > from the stats views is exactly performance critical. But (a) I *don't* want to seriously break things, and don't see a need to; (b) interval is expensive and has got its own problems, notably an internal limitation to usec resolution that we would not be able to get rid of easily. It's not even apparent to me that interval is semantically The Right Thing for values that represent an accumulation of a lot of different measurements. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] typo fix
On fre, 2012-04-13 at 17:27 +0900, Etsuro Fujita wrote: > This is a little patch to fix a typo in file-fdw.sgml Fixed. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On tis, 2012-04-10 at 09:33 -0400, Robert Haas wrote: > So, should we make the new columns exposed by pg_stat_statements use > milliseconds, so that the block read/write timings are everywhere in > milliseconds, or should we keep them as a float8, so that all the > times exposed by pg_stat_statements use float8? Wouldn't interval be the proper type to represent elapsed time? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On tor, 2012-04-12 at 18:19 -0400, Christopher Browne wrote: > On Thu, Apr 12, 2012 at 6:11 PM, Jay Levitt wrote: > > Rather than extend the CF app into a trivial-patch workflow app, it might be > > worth looking at integrating it with github. > > There's a reluctance to require a proprietary component that could > disappear on us without notice. Yeah, I think initially this was more of a matter-of-principle decision, but seeing the current spam problem on github.com, I'm glad we didn't do it. For me it has disqualified itself as a platform for intense collaborative development. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On tor, 2012-04-12 at 07:59 +0200, Magnus Hagander wrote: > It might be helpful (if the CF app had a trivial API) with a small > tool that could run from a git hook (or manual script or alias) that > would prompt for "which cf entry, if any, did this commit close"? An API for the CF app would actually be nice in general, because then I could write an small command-line tool and just type 'cf close 123' or something instead of having to click around a bunch of times. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On ons, 2012-04-11 at 23:30 -0400, Robert Haas wrote: > Now what would be sort of neat is if we had a way to keep all the > versions of patch X plus author and reviewer information, links to > reviews and discussion, etc. in some sort of centralized place. Well, a properly linked email thread contains all this. I have seen a couple of anti-patterns evolving, though, including patch authors starting a new email thread for each patch version, and reviewers starting a new email thread for each review. When that happens, the existence of the commitfest app makes things worse, in a way. (Of course, any discussion here about bug trackers emphasizes the need for email to be the primary communications method for this very reason.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 4/10/12 5:07 PM, Greg Smith wrote: I'd prefer to see at least usec resolution and 8 bytes of "dynamic range" for query related statistics. Any of these would be fine from a UI perspective to me: -float8 seconds -float8 msec -float8 usec -int8 usec I don't think int8 msec will be enough resolution to time queries for very long, if it's not already obsolete. The committed example for pg_test_timing on good hardware already clocks trivial events at a single usec. Even I/O is getting there. I've measured my Fusion-io loaner card peaking at 8GB/s, which works out to 1 usec per 8K page. None of that is even price no object hardware today; it's the stuff sitting in my office. If anything, I'd expect more timing code in the database that only has ms resolution right now will start looking fat in a year or two, and more things might need to be shifted to usec instead. Checkpoint timing can survive having less resolution because its primary drumbeat is very unlikely to drop below the minutes range. I agree that ms is on it's way out... and frankly it wouldn't surprise me if at some point we actually had need of ns resolution. Given that, I don't think ms or us definitions make sense... float8 seconds seams much more logical to me. Though, if we're going to end up seriously breaking things anyway, perhaps it would make sense to switch everything over to interval... I realize that there's more overhead there, but I don't think selecting from the stats views is exactly performance critical. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On 13 April 2012 20:15, Robert Haas wrote: > On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu wrote: >> By using all 64-bits of the hash that we currently calculate, instead >> of the current use of 32-bits only, the collision probabilities are >> very low. > > That's probably true, but I'm not sure it's worth worrying about - > one-in-four-billion is a pretty small probability. That assumes that our hashing of query trees will exhibit uniform distribution. While it's easy enough to prove that hash_any does that, it would seem to me that that does not imply that we will exhibit perfectly uniform distribution within pg_stat_statements. The reason that I invented the jumble nomenclature to distinguish it from a straight serialisation is that, apart from the fact that many fields are simply ignored, we still couldn't deserialize what we do store from the jumble, because the correct way to serialise a tree entails serialising NULL pointers too - two non-equivalent jumbles could actually be bitwise identical. In practice, I think that adding NodeTags ought to be sufficient here, but I wouldn't like to bet my life on it. Actually, that is a point that perhaps should have been touched on in the comments at the top of the file. You may wish to take a look at my original analysis in the pg_stat_statements normalisation thread, which attempts to quantify the odds by drawing upon the mathematics of the birthday problem. > On the broader issue, Peter's argument here seems to boil down to > "there is probably a reason why this is useful" and Tom's argument > seems to boil down to "i don't want to expose it without knowing what > that reason is". Peter may well be right, but that doesn't make Tom > wrong. If we are going to expose this, we ought to be able to > document why we have it, and right now we can't, because we don't > *really* know what it's good for, other than raising awareness of the > theoretical possibility of collisions, which doesn't seem like quite > enough. Well, it's important to realise that the query string isn't really stable, to the extent that it could differ as the query is evicted and re-enters pg_stat_statements over time. The hash value is necessarily a stable identifier, as it is the very identifier used by pg_stat_statements. People are going to want to aggregate this information over long periods and across database clusters, and I would really like to facilitate that. To be honest, with the plans that we have for replication, with the addition of things like cascading replication, I think it will increasingly be the case that people prefer built-in replication. The fact that the actual hash value is affected by factors like the endianness of the architecture in question seems mostly irrelevant. If we were to expose this, I'd suggest that the documentation in the table describing each column say of the value: query_hash | stable identifier used by pg_stat_statements for the query There'd then be additional clarification after the existing reference to the hash value, which gave the required caveats about the hash value being subject to various implementation artefacts that effectively only make the values persistently indentify queries referencing particular objects in the same database (that is, the object OID values are used), or across databases that are binary clones, such as between a streaming replica master and its standby. The OID restriction alone effectively shadows all others, so there's no need to mention endianness or anything like that. Anyone who jumped to the conclusion that their aggregation tool would work fine with Slony or something based on the query_hash description alone would probably have bigger problems. > On another note, I think this is a sufficiently major change that we > ought to add Peter's name to the "Author" section of the > pg_stat_statements documentation. Thanks. I actually thought this myself, but didn't want to mention it because I didn't think that it was up to me to decide, or to attempt to influence that decision. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] xlog location arithmetic
On Mon, Mar 12, 2012 at 10:34 PM, Fujii Masao wrote: >>> I can see a usecase for having a pg_size_pretty(numeric) as an option. >>> Not necessarily a very big one, but a >0 one. >> >> +1. > > +1, too. I did some beautification of this patch. I think the attached version is cleaner and easier to read. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company sizepretty_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Fri, Apr 13, 2012 at 4:01 PM, Tom Lane wrote: > Robert Haas writes: >> On another note, I think this is a sufficiently major change that we >> ought to add Peter's name to the "Author" section of the >> pg_stat_statements documentation. > > +1 ... as long as we have those at all, they probably ought to credit > anybody who did substantial work on the module. > > I think that eventually we'll have to give them up, just the way that > we don't credit anybody in particular as author of the core code; but > for now this is a good change. Yeah. I'd actually be in favor of removing them, and similar references in the comments, because they do lead and have led to disputes about who deserves to be mentioned. However, a change of this magnitude certainly deserves mention; it's not really the same module any more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
Robert Haas writes: > On another note, I think this is a sufficiently major change that we > ought to add Peter's name to the "Author" section of the > pg_stat_statements documentation. +1 ... as long as we have those at all, they probably ought to credit anybody who did substantial work on the module. I think that eventually we'll have to give them up, just the way that we don't credit anybody in particular as author of the core code; but for now this is a good change. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.
On 04/13/2012 08:15 PM, Kevin Grittner wrote: Robert Haas wrote: In my view, remote_write seems a lot more clear than write +1 I sure didn't understand it to mean remote_write when I read the subject line. Neither did I. So definitely +1. -- Guillaume http://www.postgresql.fr http://dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PREPARE TRANSACTION compatibility?
On 13.04.2012 22:02, Tom Lane wrote: Heikki Linnakangas writes: On 13.04.2012 21:43, Peter Eisentraut wrote: The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED are the only ones that do not have a Compatibility section on their reference page. Does anyone remember whether they were our invention or copied from or inspired by some other implementation? They are our invention. Entirely? I'm quite sure I remember some discussion about compatibility of the prepared-transaction GIDs. There must be at least part of that API that's standard. There's the X/Open standard, and the Java Transaction API (JTA) standard that's based on it. There's probably other specs for other languages, but those two are what I've looked at. The specs define the client-side interface, but they don't say anything about the client-server protocol. X/Open and JTA have the concept of GIDs, called Xids in the JDBC nomenclature. IIRC they have a certain width, and consist of two parts, a transaction id, and a branch id. But from the server's point of view, the GIDs are just unique identifiers, assigned by the client (= transaction manager), with no further special meaning. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.
On 13 April 2012 19:15, Kevin Grittner wrote: > Robert Haas wrote: > >> In my view, remote_write seems a lot more clear than write > > +1 > > I sure didn't understand it to mean remote_write when I read the > subject line. Whatever this option value is named, it needs to be referenced in the postgresql.conf comment for this option, as it isn't currently. I have a question though. What happens when this is set to "write" (or "remote_write" as proposed) but it's being used on a standalone primary? At the moment it's not documented what level of guarantee this would provide. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: add timing of buffer I/O requests
On Wed, Apr 11, 2012 at 9:02 AM, k...@rice.edu wrote: > By using all 64-bits of the hash that we currently calculate, instead > of the current use of 32-bits only, the collision probabilities are > very low. That's probably true, but I'm not sure it's worth worrying about - one-in-four-billion is a pretty small probability. On the broader issue, Peter's argument here seems to boil down to "there is probably a reason why this is useful" and Tom's argument seems to boil down to "i don't want to expose it without knowing what that reason is". Peter may well be right, but that doesn't make Tom wrong. If we are going to expose this, we ought to be able to document why we have it, and right now we can't, because we don't *really* know what it's good for, other than raising awareness of the theoretical possibility of collisions, which doesn't seem like quite enough. On another note, I think this is a sufficiently major change that we ought to add Peter's name to the "Author" section of the pg_stat_statements documentation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PREPARE TRANSACTION compatibility?
Heikki Linnakangas writes: > On 13.04.2012 21:43, Peter Eisentraut wrote: >> The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED >> are the only ones that do not have a Compatibility section on their >> reference page. Does anyone remember whether they were our invention or >> copied from or inspired by some other implementation? > They are our invention. Entirely? I'm quite sure I remember some discussion about compatibility of the prepared-transaction GIDs. There must be at least part of that API that's standard. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PREPARE TRANSACTION compatibility?
On 13.04.2012 21:43, Peter Eisentraut wrote: The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED are the only ones that do not have a Compatibility section on their reference page. Does anyone remember whether they were our invention or copied from or inspired by some other implementation? They are our invention. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PREPARE TRANSACTION compatibility?
The commands PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED are the only ones that do not have a Compatibility section on their reference page. Does anyone remember whether they were our invention or copied from or inspired by some other implementation? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.
Robert Haas wrote: > In my view, remote_write seems a lot more clear than write +1 I sure didn't understand it to mean remote_write when I read the subject line. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 13 April 2012 18:33, Robert Haas wrote: > We do use insertion sort for partitions of size < 7. I assume you are > referring to something else. I stand corrected. My confusion came from the removal of a later *switch* to insertion sort in a3f0b3d68f9a5357a3f72b40a45bcc714a9e0649, which also added the pre-sorted check where you'd expect to see the insertion switch. Of course, the n < 7 insertion sort thing is right beside the added check. So another, redundant copy of insertion sort was removed, and not the one that almost every real-world quicksort implementation has. > Heap-sorting could also benefit from some optimization in this area. > Right now, if you heap-sort data that's already in order, it gets > progressively slower as you crank up work_mem, because the heap gets > deeper and so extraction and siftup get more expensive, for no real > benefit. We could do something like check, every time we use up our > available memory, whether the heap is already entirely in order. If > so, we could dump all but one tuple to the current run and the start > reading tuples again. Or maybe dump some percentage of the array, > though that might require a bit more bookkeeping. Or maybe there's a > smarter algorithm that would also cater to mostly-sorted data. Well, timsort is specifically designed to take advantage of pre-sorted data. It does appear to have a lot of traction, as wikipedia points out: "Timsort has been Python's standard sorting algorithm since version 2.3. It is now also used to sort arrays in Java SE 7,[2] and on the Android platform.[3] " Somebody has produced a BSD-licensed C implementation that draws heavily on the Python/Java one here: http://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17 It looks like it has exactly the same interface as a c stdlib qsort. So it'd be fairly easy to produce a timsort_arg() based on this, if anyone cares to. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE
Alvaro Herrera writes: > Excerpts from Andrew Dunstan's message of mié abr 11 15:51:51 -0300 2012: >> On 04/11/2012 02:45 PM, Tom Lane wrote: >>> I don't really care for the idea that the ONLY goes in a different place >>> for this operation than for every other kind of ALTER TABLE, but it does >>> make sense if you subscribe to the quoted theory that ONLY is a property >>> of the constraint and not the ALTER command as such. >> I think I rather dislike it. ONLY should be followed by the name of the >> parent table whose children it causes us to exclude, IMNSHO. Moving it >> elsewhere doesn't seem to me to be a blow for clarity at all. > If that's the only objection, maybe we could use a different keyword > then, perhaps NOINHERIT: > ALTER TABLE constraint_rename_test ADD CONSTRAINT con2 CHECK NOINHERIT (b> > 0); I could live with that. "CHECK ONLY" isn't particularly transparent as to what it means, anyway. "CHECK NOINHERIT" seems a lot clearer. I'd propose "CHECK NO INHERIT", though, as (a) it seems better English and (b) it avoids creating any new keyword. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Command counter increment vs updating an active snapshot
Hi all (re-posting from pgsql-general), I'm looking into Postgres' internals, and had two quick questions that relate to each other. (1) What's the difference between advancing the command counter and updating an active snapshot? For example, I see that DefineRelation() increments the command counter, but explain.c / copy.c explicitly calls UpdateActiveSnapshotCommandId(). Is that because the latter call intends to make its changes visible to other concurrent processes? (2) The following change in pquery.c asserts that, if more than one utility statement exists in portal statements, they all have to be Notify statements. https://github.com/postgres/postgres/commit/c0b00760365c74308e9e0719c993eadfbcd090c2#diff-6 When I modify the code so that one statement in portal->stmts gets translated into four separate statements that all depend on each other (one planned, two utility, and another planned statement) and remove the assertion, all four statements still run fine. Looking into the code, I understand this isn't the expected behavior in terms of snapshot handling. In what scenarios can I expect our code to break? Thanks, Ozgun. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Fri, Apr 13, 2012 at 1:15 PM, Peter Geoghegan wrote: > On 13 April 2012 17:42, Peter Geoghegan wrote: >> One insight that I had at the time was that text comparisons where the >> c locale isn't used are really rather expensive, and I doubt that >> there is too much that can be done to address that directly. However, >> if we were to support timsort, that could substantially cut down on >> the number of comparisons for text columns, which could be a real win. >> Maybe that would happen through some explicit mechanism, or maybe the >> planner could actually infer that it was likely to be optimal to use >> timsort based on a high statistical correlation between physical row >> ordering and logical ordering of a key column's values. > > Further thoughts: > > At the time we committed our own quicksort implementation, based on > the NetBSD one, we eschewed the optimisation of using insertion sort > when n is fairly low. This happens to be a very common optimisation, > so I'm not really super-confident that that was a good decision. > However, we also added our own optimisation, which is to attempt, > regardless of the size of n, to ascertain that the array is > pre-sorted, in which case we don't quicksort at all. We do use insertion sort for partitions of size < 7. I assume you are referring to something else. One thing that has struck me as odd is that we check for presorted arrays at every level of recursion. That is, if we start out with 100 elements, we'll check whether the input is presorted. Then, if we partition it into a block of 60 elements and a block of 40 elements, we'll check each of those partitions over again to see whether they are presorted. There is definitely some potential upside here, because if the input is mostly sorted it may contain runs where everything is in order, and we'll detect that and avoid some work. But it's also kind of expensive; I've wondered whether we should, for example, move the test for presorted input down a bit, so that it only happens in the n > 40 case. Another idea is to arrange things so that we remember the offset at which the last presort check failed. When we tail-recurse into the left half of the array, there's no need to recheck - if the presort check fell out after our partition boundary, it's presorted; if it fell out before the partition boundary, it isn't. > So if we attempt to quicksort an array which is almost pre-sorted, but > say only has its very last element out-of-order, we'll do fairly > horribly, because we waste the effort of almost an entire "bubble sort > iteration". So almost-sorted data would seem to be a compelling thing > to optimise for that reason, as well as for the simple observation > that it isn't exactly uncommon in a relational database. Heap-sorting could also benefit from some optimization in this area. Right now, if you heap-sort data that's already in order, it gets progressively slower as you crank up work_mem, because the heap gets deeper and so extraction and siftup get more expensive, for no real benefit. We could do something like check, every time we use up our available memory, whether the heap is already entirely in order. If so, we could dump all but one tuple to the current run and the start reading tuples again. Or maybe dump some percentage of the array, though that might require a bit more bookkeeping. Or maybe there's a smarter algorithm that would also cater to mostly-sorted data. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 release notes, beta time?
On Fri, Apr 13, 2012 at 12:40:09PM -0400, Robert Haas wrote: > On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian wrote: > > I just talked to Tom about the 9.2 release notes. Do people want me to > > write the 9.2 release notes? > > +1. > > > When do you think we will be ready for 9.2 > > beta? > > Well, we've got a bunch of open issues, but most of them don't look > *too* serious. If everyone dropped what they're doing and worked on > them, I think we could be done in 2 weeks, but realistically I think > it's likely to take 1-2 months. I was hoping for at least two weeks because I am away Monday to Thursday of next week for EnterpriseDB. I can do it by March 27, or the latest March 30, that is 2-2.5 weeks from now. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 13 April 2012 17:42, Peter Geoghegan wrote: > One insight that I had at the time was that text comparisons where the > c locale isn't used are really rather expensive, and I doubt that > there is too much that can be done to address that directly. However, > if we were to support timsort, that could substantially cut down on > the number of comparisons for text columns, which could be a real win. > Maybe that would happen through some explicit mechanism, or maybe the > planner could actually infer that it was likely to be optimal to use > timsort based on a high statistical correlation between physical row > ordering and logical ordering of a key column's values. Further thoughts: At the time we committed our own quicksort implementation, based on the NetBSD one, we eschewed the optimisation of using insertion sort when n is fairly low. This happens to be a very common optimisation, so I'm not really super-confident that that was a good decision. However, we also added our own optimisation, which is to attempt, regardless of the size of n, to ascertain that the array is pre-sorted, in which case we don't quicksort at all. So if we attempt to quicksort an array which is almost pre-sorted, but say only has its very last element out-of-order, we'll do fairly horribly, because we waste the effort of almost an entire "bubble sort iteration". So almost-sorted data would seem to be a compelling thing to optimise for that reason, as well as for the simple observation that it isn't exactly uncommon in a relational database. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.
On Thu, Jan 26, 2012 at 1:21 AM, Fujii Masao wrote: > On Wed, Jan 25, 2012 at 11:12 PM, Simon Riggs wrote: >> On Wed, Jan 25, 2012 at 1:57 PM, Robert Haas wrote: >> >>> I think that this would be a lot more clear if we described this as >>> synchronous_commit = remote_write rather than just synchronous_commit >>> = write. Actually, the internal constant is named that way already, >>> but it's not exposed as such to the user. >> >> That's something to discuss at the end of the CF when people are less >> busy and we get more input. >> >> It's an easy change whatever we decide to do. > > Added this to 9.2 Open Items. OK, so I think it's time to decide what we want to do here. In my view, remote_write seems a lot more clear than write (since someone might otherwise think we were talking about a local write) and it has the additional advantage of matching the internal naming convention - surely, if it's write to call it SYNCHRONOUS_COMMIT_REMOTE_WRITE inside the system, then there's not really much reason to drop the word "remote" on the user-visible side of things. However, I just work here. Does anyone want to make a counter-argument? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 release notes, beta time?
On Fri, Apr 13, 2012 at 12:51 PM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian wrote: >>> When do you think we will be ready for 9.2 >>> beta? > >> Well, we've got a bunch of open issues, but most of them don't look >> *too* serious. If everyone dropped what they're doing and worked on >> them, I think we could be done in 2 weeks, but realistically I think >> it's likely to take 1-2 months. > > Well, I would sure like to see a beta out sooner than that. I don't > think we necessarily have to resolve every known open issue before > we put out beta1. I believe we do need to resolve anything that would > force initdb (such as Peter's bytea_agg change), and of course we need > to write release notes so that beta testers know what to test. Fair enough. > At the same time, I'd like to light a fire under people to deal with > the open issues. I agree that it will take some time for all of > them to get dealt with, but it doesn't help if the responsible hackers > are procrastinating. The joys of an all-volunteer project. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 release notes, beta time?
Robert Haas writes: > On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian wrote: >> When do you think we will be ready for 9.2 >> beta? > Well, we've got a bunch of open issues, but most of them don't look > *too* serious. If everyone dropped what they're doing and worked on > them, I think we could be done in 2 weeks, but realistically I think > it's likely to take 1-2 months. Well, I would sure like to see a beta out sooner than that. I don't think we necessarily have to resolve every known open issue before we put out beta1. I believe we do need to resolve anything that would force initdb (such as Peter's bytea_agg change), and of course we need to write release notes so that beta testers know what to test. At the same time, I'd like to light a fire under people to deal with the open issues. I agree that it will take some time for all of them to get dealt with, but it doesn't help if the responsible hackers are procrastinating. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml
On Fri, Apr 13, 2012 at 8:12 AM, Christoph Berg wrote: > diff --git a/doc/src/sgml/high-availability.sgml > b/doc/src/sgml/high-availability.sgml Thanks for the patch; however, please attach patches rather than including them inlined; they don't extract cleanly when included inline, at least not for me. I recreated this one (I think) and committed it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On 13 April 2012 16:03, Greg Stark wrote: > Fwiw it also only holds for comparison sorts. If you can sort your > items without actually comparing each item with the others then you > aren't bound by it at all. Notably algorithms like radix sort and > direct sort aren't bound by it and are O(n). I'm hoping to look at > trying some of these for integer sorts when they apply using the new > sort specialization code Peter added. Actually, though a later revision of the patch did nominally allow for user-defined sorting functions through the SortSupport infrastructure (I didn't intend that it would be complete/useful enough to be really worth documenting), the version that was finally committed did not. However, there is a fairly obvious entry point for a radix sort, which is here: /* * We were able to accumulate all the tuples within the allowed * amount of memory. Just qsort 'em and we're done. */ if (state->memtupcount > 1) { /* Can we use the single-key sort function? */ if (state->onlyKey != NULL) qsort_ssup(state->memtuples, state->memtupcount, state->onlyKey); else qsort_tuple(state->memtuples, state->memtupcount, state->comparetup, state); } The only specialisation that occurs here is between tuple sorts of a single key and multiple (type-specific specialisations were ultimately not judged to be worth the increase in binary size relative to their diminishing benefits). You'd probably set-up a type-specific positional notation machinery within the state's SortSupport struct (the type-specific abstraction through which we elide the SQL function call machinery for types that support it). One insight that I had at the time was that text comparisons where the c locale isn't used are really rather expensive, and I doubt that there is too much that can be done to address that directly. However, if we were to support timsort, that could substantially cut down on the number of comparisons for text columns, which could be a real win. Maybe that would happen through some explicit mechanism, or maybe the planner could actually infer that it was likely to be optimal to use timsort based on a high statistical correlation between physical row ordering and logical ordering of a key column's values. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving our clauseless-join heuristics
On Fri, Apr 13, 2012 at 10:51 AM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane wrote: >>> After some reflection I think that the blame should be pinned on >>> have_relevant_joinclause(), which is essentially defined as "is there >>> any join clause that can be evaluated at the join of these two >>> relations?". I think it would work better to define it as "is there any >>> join clause that both these relations participate in?". > >> I think it's getting a little late in the day to be whacking the >> planner around too much, but I have to admit that seems like a pretty >> good and safe change to me, so maybe we should go ahead and do it. >> I'm a bit worried, though, that with all the planner changes this >> release we are going to spend a lot of time tracking down regressions >> either in planning time or in plan quality. > > Could be. I think though that this fits in pretty naturally with the > parameterized-path changes, since both of them are really directed > towards being able to apply inner indexscans in cases where we could > not before. Yeah, I had the same thought. I am more worried that either the changes we made for index-only scans or the parameterized paths stuff is going to turn out to introduce nasty, as-yet-unforeseen regressions. But at this point we're in for that pain whatever we do about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.2 release notes, beta time?
On Fri, Apr 13, 2012 at 11:23 AM, Bruce Momjian wrote: > I just talked to Tom about the 9.2 release notes. Do people want me to > write the 9.2 release notes? +1. > When do you think we will be ready for 9.2 > beta? Well, we've got a bunch of open issues, but most of them don't look *too* serious. If everyone dropped what they're doing and worked on them, I think we could be done in 2 weeks, but realistically I think it's likely to take 1-2 months. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] index-only scans vs. Hot Standby, round two
Currently, we have a problem with index-only scans in Hot Standby mode: the xmin horizon on the standby might lag the master, and thus an index-only scan might mistakenly conclude that no heap fetch is needed when in fact it is. I suggested that we handle this by suppressing generation of index-only scan plans in Hot Standby mode, but Simon, Noah, and Dimitri were arguing that we should instead do the following, which is now on the open items list: * Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts so that IndexOnlyScans work on Hot Standby Ideally, this would also allow us to remove the following hack from heapgetpage(), which would improve sequential-scan performance for Hot Standby. /* * If the all-visible flag indicates that all tuples on the page are * visible to everyone, we can skip the per-tuple visibility tests. But * not in hot standby mode. A tuple that's already visible to all * transactions in the master might still be invisible to a read-only * transaction in the standby. */ all_visible = PageIsAllVisible(dp) && !snapshot->takenDuringRecovery; As far as I can see, to make this work, we'd need to have lazy_scan_heap() compute the latest xmin on any potentially all-visible page and treat that as a cutoff XID, cancelling queries with snapshots whose xmins equal or precede that value, just as we currently do when freezing tuples (cf xlog_heap_freeze). There are two things to worry about: correctness, and more query cancellations. In the department of query cancellations, I believe Noah argued previously that this wasn't really going to cause a problem. And, indeed, if the master has a mix of inserts, updates, and deletes, then it seems likely that any recovery conflicts generated this way would be hitting about the same place in the XID space as the ones caused by pruning away dead tuples. What will be different, if we go this route, is that an insert-only master, which right now only generates conflicts at freezing time, will instead generate conflicts much sooner, as soon as the relation is vacuumed. I do not use Hot Standby enough to know whether this is a problem, and I'm not trying to block this approach, but I do want to make sure that everyone agrees that it's acceptable before we go do it, because inevitably somebody is going to get bit. In making our decision, I think we should keep in mind that we are currently pretty laid-back about marking pages all-visible, and that's probably something we're going to want to change over time: for example, we recently discussed the fact that a page with one dead tuple in it currently needs *two* vacuums to become all-visible, because only the first vacuum pass is smart enough to mark things all-visible, and the first heap pass only truncates the dead tuple to a dead line pointer, so the page isn't all-visible a that point. When we fix that, which I think we're almost certainly going to want to do at some point, then that means these conflicts will occur that much sooner. I think we will likely also want to consider marking things all-visible on the fly at some point in the future; for example, if we pull a buffer in for an index-only scan and dirty it by setting a hint bit, we might want to take the plunge and mark it all-visible before evicting it, to save effort the next time. Again, not trying to dissuade anyone, just making sure we've thought through it enough to avoid being unhappy later. It's worth noting that none of this will normally matter if hot_standby_feedback=on, so part of the analysis here may depend on how many people we think have flipped that switch. As to correctness, after further review of lazy_scan_heap(), I think there are some residual bugs here that need to be fixed whatever we decide to do about the Hot Standby case: 1. I noticed that when we do PageSetAllVisible() today we follow it up with SetBufferCommitInfoNeedsSave(). PD_ALL_VISIBLE is not merely a hint, so I think these should be changed to MarkBufferDirty(), which shouldn't make much difference given current code, but proposals to handle hint changes differently than non-hint changes abound, so it's probably not good to rely on those meaning the same thing forever. 2. More seriously, lazy_scan_heap() releases the buffer lock before setting the all-visible bit on the heap. This looks just plain wrong (and it's my fault, to be clear). Somebody else can come along after we've released the buffer lock and mark the page not-all-visible. That concurrent process will check the visibility map bit, find it already clear, and go on its merry way. Then, we set the visibility map bit and go on our merry way. Now we have a not-all-visible page with the all-visible bit set in the visibility map. Oops. I think this probably needs to be rewritten so that lazy_scan_heap() acquires a pin on the visibility map page before locking the buffer for cleanup and holds onto that pin until either we exit the ran
Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?
On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote: > On 08.04.2012 11:59, Guillaume Lelarge wrote: > > Hi, > > > > I recently wrote a plpgsql plugin. I wanted to enable the use of pgxs, > > to make it easier to compile the plugin, but I eventually found that I > > can't do that because the plpgsql.h file is not available in the include > > directory. > > > > I'm wondering if we shouldn't put the header files of plpgsql source > > code in the include directory. It would help compiling the PL/pgsql > > debugger, and profiler (and of course my own plugin). > > Yep, I just bumped into this myself, while trying to make pldebugger > module compilable with pgxs. > > > There could be a good reason which would explain why we can't (or don't > > want to) do this, but I don't see it right now. > > Me neither, except a general desire to keep internals hidden. I propose > the attached. > Sounds good to me. I would love to see this happening in 9.2. Thanks, Heikki. -- Guillaume http://blog.guillaume.lelarge.info http://www.dalibo.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.2 release notes, beta time?
I just talked to Tom about the 9.2 release notes. Do people want me to write the 9.2 release notes? When do you think we will be ready for 9.2 beta? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas wrote: > On Sun, Mar 18, 2012 at 11:25 AM, Tom Lane wrote: >> So has somebody found a hole in the n log n lower bound on the cost of >> comparison-based sorting? I thought that had been proven pretty >> rigorously. > > There's not much danger of anyone finding a way around that bound > since the proof is quite trivial, but keep in mind that it's a > worst-case bound. Fwiw it also only holds for comparison sorts. If you can sort your items without actually comparing each item with the others then you aren't bound by it at all. Notably algorithms like radix sort and direct sort aren't bound by it and are O(n). I'm hoping to look at trying some of these for integer sorts when they apply using the new sort specialization code Peter added. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving our clauseless-join heuristics
Robert, * Robert Haas (robertmh...@gmail.com) wrote: > I'm a bit worried, though, that with all the planner changes this > release we are going to spend a lot of time tracking down regressions > either in planning time or in plan quality. I dunno, I agree that there will likely be more regressions, but I also think people might be happier with more changes in release 'X' than some changes in X and then some more in Y, both of which end up destablizing their environment, particularly if we advertise that 'X' has lots of changes and that people should test it thoroughly.. My feeling is that this would be good to include in 9.2, but I could be swayed either way- I certainly agree with the sentiment that it's getting to a pretty late date. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Improving our clauseless-join heuristics
Robert Haas writes: > On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane wrote: >> After some reflection I think that the blame should be pinned on >> have_relevant_joinclause(), which is essentially defined as "is there >> any join clause that can be evaluated at the join of these two >> relations?". I think it would work better to define it as "is there any >> join clause that both these relations participate in?". > I think it's getting a little late in the day to be whacking the > planner around too much, but I have to admit that seems like a pretty > good and safe change to me, so maybe we should go ahead and do it. > I'm a bit worried, though, that with all the planner changes this > release we are going to spend a lot of time tracking down regressions > either in planning time or in plan quality. Could be. I think though that this fits in pretty naturally with the parameterized-path changes, since both of them are really directed towards being able to apply inner indexscans in cases where we could not before. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improving our clauseless-join heuristics
On Fri, Apr 13, 2012 at 10:32 AM, Tom Lane wrote: > I looked into the behavior complained of here: > http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php > > The problem query can be abstracted to > > select * from a, b, c, d > where a.x = b.y and (a.z = c.c or a.z = d.d) > > Table a is much larger than the others (in fact, in the given example > c and d are known to be single rows), and there are indexes on the > mentioned columns of a. In this situation, the best plan is to > cross-join c and d, then use a BitmapOr indexscan to pick out the rows > of a that satisfy the OR condition, and finally join that small number > of rows to b. The planner will use a cross-join-first plan if we omit > b and the first WHERE clause from the query; but in the query as given, > it fails to discover that plan and falls back on a vastly inferior plan > that involves forming the a/b join first. > > The reason for this behavior is the anti-clauseless-join heuristics in > join_search_one_level(). Without b, there are no join clauses available > at join level 2, so the planner is forced to form all three 2-way cross > joins; and then at level 3 it finds out that joining a to c/d works > well. With b, we find the a/b join has a usable join clause so we form > that join, and then we decide not to make any 2-way clauseless joins. > So the c/d join is never constructed and there is no way to exploit the > desirable indexscan at higher levels. > > After some reflection I think that the blame should be pinned on > have_relevant_joinclause(), which is essentially defined as "is there > any join clause that can be evaluated at the join of these two > relations?". I think it would work better to define it as "is there any > join clause that both these relations participate in?". In the majority > of real-world queries, join clauses relate exactly two relations, so > that these two definitions are equivalent. However, when we do have > join clauses involving 3 or more relations, such as the OR clause in > this example, it's evidently useful to consider cross-product joins of > the smaller relations so that the join clause can be applied during the > scan of the largest table. > > It would probably not be a good idea to back-patch such a change, since > it might have consequences I can't foresee at the moment. But I'm > strongly tempted to squeeze it into 9.2. Thoughts? I think it's getting a little late in the day to be whacking the planner around too much, but I have to admit that seems like a pretty good and safe change to me, so maybe we should go ahead and do it. I'm a bit worried, though, that with all the planner changes this release we are going to spend a lot of time tracking down regressions either in planning time or in plan quality. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Wed, Mar 21, 2012 at 8:57 PM, Jeff Janes wrote: > On Mon, Mar 19, 2012 at 12:23 PM, Robert Haas wrote: > ... >> >> One thing that seems inefficient to me about our current algorithm is >> the use of the run number as a leading column in the sort key. >> There's no real reason why the tuples destined for the next run need >> to be maintained in heap order; we could just store them unordered and >> heapify the whole lot of them when it's time to start the next run. >> That ought to save comparison cycles for the current run, since the >> heap will be less deep, and there are other possible savings as well - >> in particular, an unordered array of tuples can be heapify'd in linear >> time, whereas our current algorithm is O(n lg n). However, when I >> actually implemented this, I found that doing it this way was a loser. >> In fact, excluding the next-run tuples from the heap for the current >> run was a BIG loser even before you add in the time required to >> re-heapify. This confused the daylights out of me for a while, >> because it's hard to understand how insert and siftup can be slower on >> a small heap than a large one. >> >> My working theory is that, even though we must necessarily do more >> comparisons when manipulating a larger heap, many of those comparisons >> are resolved by comparing run numbers, and that's a lot cheaper than >> invoking the real comparator. For example, if we insert into a heap >> whose rightmost child is in the next run, and the new tuple goes into >> the current run, and the current size of the heap is such that the >> initial position of the new element is a descendent of the right node, >> it will very quickly crash through all of the next-run tuples and only >> need one REAL comparison - against the root. Either the new element >> ends up as the root, or it ends up as the direct child of the root; >> now we remove the root and, perhaps, replace it with its rightmost >> child, meaning that the next element we read in can do the exact same >> thing all over again. If we exclude the next-run elements from the >> heap, then the average depth of the heap when inserting a new element >> is smaller, but all the elements are in the same-run, and we have to >> invoke the real comparator every time. In other words, those next-run >> tuples act as dummies which effectively create a heap of uneven depth, >> and the average depth encountered when inserting tuples turns out to >> be less than what we get by pulling out the dummies and making the >> depth uniform. >> >> This makes me kind of nervous, because I don't understand why things >> should work out so well as all that. If throwing some dummy elements >> into the heap makes things work better, then maybe we should throw in >> some more. Or maybe it would work better to take some but not all of >> them out. There certainly doesn't seem to be any indication in the >> code that this is an anticipated effect, and it seems an awfully >> providential accident. > > Is there a patch or a git repo available for this change? I'd like to > see if I can analyze that if I get some spare time. Sorry for the slow response to this. Patch is attached (crummy-external-sort.patch). > Clever. Rather than doing a binary search of the path, it might make > sense to do a reverse search of it. The tuple is likely to send up > somewhere at the bottom of the heap, both because that is where most > of the data is, and because the tuple being reinserted just came from > the bottom so it is likely to be biased that way. Patches for these approaches also attached (siftup-using-bsearch.patch and siftup-reverse-linear.patch). Neither approach seemed terribly promising in my testing, IIRC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company crummy-external-sort.patch Description: Binary data siftup-reverse-linear.patch Description: Binary data siftup-using-bsearch.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Improving our clauseless-join heuristics
I looked into the behavior complained of here: http://archives.postgresql.org/pgsql-performance/2012-04/msg00093.php The problem query can be abstracted to select * from a, b, c, d where a.x = b.y and (a.z = c.c or a.z = d.d) Table a is much larger than the others (in fact, in the given example c and d are known to be single rows), and there are indexes on the mentioned columns of a. In this situation, the best plan is to cross-join c and d, then use a BitmapOr indexscan to pick out the rows of a that satisfy the OR condition, and finally join that small number of rows to b. The planner will use a cross-join-first plan if we omit b and the first WHERE clause from the query; but in the query as given, it fails to discover that plan and falls back on a vastly inferior plan that involves forming the a/b join first. The reason for this behavior is the anti-clauseless-join heuristics in join_search_one_level(). Without b, there are no join clauses available at join level 2, so the planner is forced to form all three 2-way cross joins; and then at level 3 it finds out that joining a to c/d works well. With b, we find the a/b join has a usable join clause so we form that join, and then we decide not to make any 2-way clauseless joins. So the c/d join is never constructed and there is no way to exploit the desirable indexscan at higher levels. After some reflection I think that the blame should be pinned on have_relevant_joinclause(), which is essentially defined as "is there any join clause that can be evaluated at the join of these two relations?". I think it would work better to define it as "is there any join clause that both these relations participate in?". In the majority of real-world queries, join clauses relate exactly two relations, so that these two definitions are equivalent. However, when we do have join clauses involving 3 or more relations, such as the OR clause in this example, it's evidently useful to consider cross-product joins of the smaller relations so that the join clause can be applied during the scan of the largest table. It would probably not be a good idea to back-patch such a change, since it might have consequences I can't foresee at the moment. But I'm strongly tempted to squeeze it into 9.2. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parameterized-path cost comparisons need some work
On Thu, Apr 12, 2012 at 3:27 PM, Tom Lane wrote: > 1. Lobotomize add_path_precheck so it always returns true for a > parameterized path. This sounds horrid, but in the test cases I'm using > it seems that this only results in doing the full path construction for > a very small number of additional paths. Query planner engineering is hard, because it's hard to predict what kind of queries people will write, but this seems basically sane to me. Given that (if I recall our previous discuss on this point correctly) we avoid generating parameterized paths in situations where we could have simply revised the join order instead, we should only really be getting any parameterized paths at all in situations where they are likely to help. Queries involving only inner joins, for example, should never need a parameterized path covering more than a single baserel; and even if you've got outer joins in the mix, most of my queries tend to look like A IJ B IJ C IJ D LJ E LJ F LJ G, rather than A IJ (B LJ C) which is where we actually need this machinery. If we spend a little more effort in that case it's quite likely to be worth it; the trick is just to keep the extra work from bleeding into the cases where it won't help. > 3. Rearrange plan generation so that a parameterized path always uses > all join clauses available from the specified outer rels. (Any that > don't work as indexquals would have to be applied as filter conditions.) > If we did that, then we would be back to a situation where all paths > with the same parameterization should yield the same rowcount, thus > justifying letting add_path_precheck work as it does now. > > #3 would amount to pushing quals that would otherwise be checked at the > nestloop join node down to the lowest inner-relation level where they > could be checked. This is something I'd suspected would be a good idea > to start with, but hadn't gotten around to implementing for non-index > quals. It had not occurred to me that it might simplify cost estimation > to always do that. This seems like it could be quite a significant win. It doesn't really matter in <= 9.1 because in an old-style parameterized nestloop the join filter is going to get applied immediately after the index filter anyway, though I guess it's possible that you might save a little bit by optimizing the order in which multiple filter conditions are applied. But if there can be intermediate joins in there then it's a big deal; and the fact that it makes it easier to compare paths and prune away bad ones earlier seems like a major benefit as well. So I would be in favor of doing this if possible, but... > I'm going to take a closer look at #3, but it may not be practical to > try to squeeze it into 9.2; if not, I think #1 will do as a stopgap. I agree with this, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
Jeff Davis writes: > On Sun, 2012-03-18 at 11:25 -0400, Tom Lane wrote: >> Yeah, that was me, and it came out of actual user complaints ten or more >> years back. (It's actually not 2X growth but more like 4X growth >> according to the comments in logtape.c, though I no longer remember the >> exact reasons why.) We knew when we put in the logtape logic that we >> were trading off speed for space, and we accepted that. > I skimmed through TAOCP, and I didn't find the 4X number you are > referring to, and I can't think what would cause that, either. The exact > wording in the comment in logtape.c is "4X the actual data volume", so > maybe that's just referring to per-tuple overhead? My recollection is that that was an empirical measurement using the previous generation of code. It's got nothing to do with per-tuple overhead IIRC, but with the fact that the same tuple can be sitting on multiple "tapes" during a polyphase merge, because some of the tapes can be lying fallow waiting for future use --- but data on them is still taking up space, if you do nothing to recycle it. The argument in the comment shows why 2X is the minimum space growth for a plain merge algorithm, but that's only a minimum. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
On tor, 2012-04-12 at 10:12 -0500, Joshua Berkus wrote: > Well actually, the other advantage of using branches is that it would > encourage committers to bounce a patch back to the submitter for > modification *instead of* doing it themselves. This would both have > the advantage of saving time for the committer, and doing a better job > of teaching submitters how to craft patches which don't need to be > modified. I don't see how using branches changes that. You could (or should) be doing that already anyway. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] not null validation option in contrib/file_fdw
On 04/13/2012 07:21 AM, Shigeru HANADA wrote: > (2012/04/13 16:59), Etsuro Fujita wrote: >> I updated the patch added to CF 2012-Next [1]. Attached is the updated >> version of the patch. > I applied the patch and ran regression tests of file_fdw, and I got > SIGSEGV X-( > > The failure occurs in fileGetOptions, and it is caused by > list_delete_cell used in foreach loop; ListCell points delete target has > been free-ed in list_delete_cell, but foreach accesses it to get next > element. > > Some of backend functions which use list_delete_cell in loop use "for" > loop instead of foreach, and other functions exit the loop after calling > list_delete_cell. Since we can't stop searching non-COPY options until > meeting the end of the options list, we would need to choose former > ("for" loop), or create another list which contains only valid COPY > options and return it via other_options parameter. > Yes, the code in fileGetOptions() appears to be bogus. Also, "validate" is a terrible name for the option (and in the code) IMNSHO. It's far too generic. "validate_not_null" or some such would surely be better. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [trivial patch] grammar fixes in doc/src/sgml/high-availability.sgml
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml new file mode 100644 index ed34dac..c5f3ff9 *** a/doc/src/sgml/high-availability.sgml --- b/doc/src/sgml/high-availability.sgml *** primary_conninfo = 'host=192.168.1.50 po *** 1022,1028 Users will stop waiting if a fast shutdown is requested. However, as ! when using asynchronous replication, the server will does not fully shutdown until all outstanding WAL records are transferred to the currently connected standby servers. --- 1022,1028 Users will stop waiting if a fast shutdown is requested. However, as ! when using asynchronous replication, the server will not fully shutdown until all outstanding WAL records are transferred to the currently connected standby servers. *** primary_conninfo = 'host=192.168.1.50 po *** 1126,1132 If you need to re-create a standby server while transactions are ! waiting, make sure that the commands to run pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit = off, otherwise those requests will wait forever for the standby to appear. --- 1126,1132 If you need to re-create a standby server while transactions are ! waiting, make sure that the commands pg_start_backup() and pg_stop_backup() are run in a session with synchronous_commit = off, otherwise those requests will wait forever for the standby to appear. Christoph -- c...@df7cb.de | http://www.df7cb.de/ signature.asc Description: Digital signature
Re: [HACKERS] not null validation option in contrib/file_fdw
(2012/04/13 16:59), Etsuro Fujita wrote: > I updated the patch added to CF 2012-Next [1]. Attached is the updated > version of the patch. I applied the patch and ran regression tests of file_fdw, and I got SIGSEGV X-( The failure occurs in fileGetOptions, and it is caused by list_delete_cell used in foreach loop; ListCell points delete target has been free-ed in list_delete_cell, but foreach accesses it to get next element. Some of backend functions which use list_delete_cell in loop use "for" loop instead of foreach, and other functions exit the loop after calling list_delete_cell. Since we can't stop searching non-COPY options until meeting the end of the options list, we would need to choose former ("for" loop), or create another list which contains only valid COPY options and return it via other_options parameter. Regards, -- Shigeru HANADA -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
Robert Haas writes: > The real problem with the command triggers patch is that we got a > blizzard of code. It's unrealistic to expect anyone to devote serious > review time to a patch that's under constant development. It also > strikes me that a tremendous amount of pain could have been avoided by > posting a clear and detailed design sketch for that patch before > beginning to code. Dimitri contended that without code, no one will > read design sketches, but that doesn't for the most part jive with my > experience, and I think that the strategy he actually chose backfired, > because it was clear that any review would be hitting a moving target. In my mind at least it's been more subtle. I had an agreed-on design months before I started to code anything, at the Cluster Hackers Meeting in Ottawa, where several commiters and long term contributors have been participating in the discussion. The big mistake seems to be starting to code with that rather than spending another couple of months (or easily way more than that) of rehashing it on-list. About the design sketches, IME, what you can agree on on-list is a very high level view about how to implement a feature, anything detailed enough to have practical impact on the code you're writing happens while reviewing code. Again, as I though the high level view was ok as of the Cluster Hackers Meeting, I skipped that part and went directly to the code level review. Greg Smith will certainly stamp that email with a big red “Lesson Learned” stamp here: nothing happened if you don't have a link to a pgsql-hackers thread on the archives. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
Tom Lane writes: > Andres Freund writes: >> They might have been half-baked. But several of those didn't get design-level >> review for several weeks which makes it rather hard to fully bake them in >> time... > > But if they didn't already have design-level review, that means they > were not seen in any previous CF, which means they were not following > the expectation that nontrivial patches should be submitted earlier than > the last CF. That's not true, unfortunately. I had several rounds of design review in the previous CF, then some more in the last CF, then another one that basically meant baking another patch: same feature but spelled out differently (renaming catalog, files and APIs is no fun in a big patch) to be able to have a new way to organize the code and docs. It's for the best, and it's better to do that before commit, no question. I appreciate having had that review, really. The problem with the command trigger has been another one. First, if we want the last CF to be only about commits and not about feedback, we should simply bite the bullet and spell it out loud. Second, we should have been talking about the hard decision rather than pretending we could still do something in this release. That's mainly on me and I know it, but not being a commiter I have no other power than influence, and I've been optimistic (and playing by the rules). > I think the key point here is that people have to expect that it's going > to take more than one round of review to land most nontrivial patches. > And we have to discourage the expectation that that can happen within > the last CF of a release cycle. If anything, the last CF has to be > tighter not looser than others on what we will accept, because there is > no time to recover if something proves wrong with a patch after a > month or three. The more I think about that, the more I think we should declare the last CF to be about preparing a release, not providing another round of feedback. That could well be all we need here. > I keep coming back to the thought that we need more and shorter CFs, > and/or ReviewFests that are meant to help push WIP patches further > down the track. We need to make it easier to get those early reviews > done while there's still development time left. Separating away commit fest made for commits and those meant for feedback is a good idea too. I'm not sure how much benefit we would get there for non-last CFs, though. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Last gasp
Robert Haas writes: > Even before this CommitFest, it's felt to me like this hasn't been a > great cycle for reviewing. I think we have generally had fewer people > doing reviews than we did during the 9.0 and 9.1 cycles. I think we > had a lot of momentum with the CommitFest process when it was new, but > three years on I think there's been some ebbing of the relative > enthusiastic volunteerism that got off the ground. I don't have a I have another analysis here. Tom once said that commiters won't grow on tree, and that is true for submitters too, obviously. I think the CF process has been good into growing more submitters and growing existing ones abilities too. So you don't have that fewer reviewers available, it may be that they're just heavily involved into being submitters too. Tom Lane writes: > Yeah, this is something I was thinking about yesterday. In the first > couple of release cycles with the CommitFest process, we were willing to > let the last fest of a release cycle go on for "as long as it takes", > or at least that was what I felt the policy to be. This time we > eventually gave up and declared closure, but in hindsight we should > likely have done that a month earlier. The fact of the matter is that > quite a few of the patches we were dealing with were *not* ready to > commit, or even close to that, at the start of the fest. If it weren't > the last fest they would have gotten marked Returned With Feedback a > lot sooner. This and other posts in this threads are all hinting the same thing to me: the last commit fest is *not* about feedback at all. If you still need reviewers rather than commiters, you're out of luck for this release, see you next time. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] typo fix
This is a little patch to fix a typo in file-fdw.sgml Best regards, Etsuro Fujita file_fdw_typo_fix_20120413.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] not null validation option in contrib/file_fdw
I updated the patch added to CF 2012-Next [1]. Attached is the updated version of the patch. For the discussion in [2], I've introduced a new generic option, validate for file_fdw foreign tables, which specifies if file_fdw verifies that tuples meet NOT NULL constraints. The default value for the option is 'false', and if the value is set to 'true', then file_fdw verifies NOT NULL constraints. For example, a user can issue the following to let file_fdw verify the constraint of the msg column. CREATE FOREIGN TABLE ft (id INTEGER, msg TEXT NOT NULL) SERVER fs OPTIONS (filename '/path/to/file', format 'csv', delimiter ',', validate 'true'); At the SELECT time, ABORT is issued when a NOT NULL violation error has been found, like COPY FROM. Once the validation is done successfully using e.g. SELECT COUNT(*), the user can set the option to 'false' using ALTER FOREIGN TABLE. I think this option is needed for flat files due to their lack of ability to check such a constraint. (I added NOT NULL checking to ileIterateForeignScan(), but not to file_acquire_sample_rows(). Should we do that at file_acquire_sample_rows()? I think it is good to just recommend that users do ANALYZE a foreign table after the validation.) For the discussion in [3], I've added a new external function ExecNotNullCheck() and call it from fileIterateForeignScan. Any comments are welcome. Best regards, Etsuro Fujita [1] https://commitfest.postgresql.org/action/patch_view?id=822 [2] http://archives.postgresql.org/message-id/1038.1331738...@sss.pgh.pa.us [3] http://archives.postgresql.org/pgsql-hackers/2012-03/msg00809.php file_fdw_notnull_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers