Re: [HACKERS] Streaming replication for psycopg2
On 30 June 2015 at 22:42, Shulgin, Oleksandr wrote: > On Thu, Jun 4, 2015 at 5:49 PM, Shulgin, Oleksandr > wrote: >> >> On Tue, Jun 2, 2015 at 2:23 PM, Shulgin, Oleksandr >> wrote: >> > >> > I've submitted a patch to psycopg2 to support streaming replication >> > protocol (COPY_BOTH): https://github.com/psycopg/psycopg2/pull/322 > > Hello again, > > I have updated the pull request above to address the feedback I've gathered > from using this construct internally and from other sources. Now included, > the lower level asynchronous interface that gives the user more control, for > the price of doing select() and keeping track of keepalive timeouts > manually. > > It would be nice if someone could review this. The final look can be found > by using this link (docs included): > https://github.com/psycopg/psycopg2/pull/322/files Per the pull request's history, I've given this a fairly detailed review. With a few minor changes I think it's really good and will be a valuable addition to psycopg2. Before merging I think the connection should preferably be split into a proper subclass, though it might not be worth the verbosity involved when working with the CPython API. I suspect that the row-by-row COPY support should be pushed down to be shared with copy_expert, too. There's so much logic that'd be shared: the push (callback) or pull oriented read modes, the support for reading raw bytes or decoded unicode text for rows, the result object, etc. Otherwise I think this is great, and it was a real pleasure to read a patch where the first half is good documentation and examples. Since pyscopg2 is libpq based, so it can't be treated as an independent re-implementation of the protocol for testing purposes. For that it'd probably be necessary to add replication protocol support to PgJDBC. But it's still really handy for prototyping logical decoding clients, testing output plugins, adapting logical decoding output to feed into other systems, etc, and I can see this as something that could be really handy for optional extra tests for PostgreSQL its self - validating the replication protocol, etc. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] postgres_fdw extension support
On Sun, Oct 4, 2015 at 11:40 AM, Paul Ramsey wrote: > I put all changes relative to your review here if you want a nice colorized > place to check > > https://github.com/pramsey/postgres/commit/ed33e7489601e659f436d6afda3cce28304eba50 -/* updatable is available on both server and table */ +/* updatable option is available on both server and table */ This is just noise (perhaps I am the one who introduced it, oops). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE behind-the-scenes effects' CONTEXT
2015-10-05 0:08 GMT+02:00 Marko Tiikkaja : > Hi, > > In the past I've found the error message in cases such as this somewhat > less helpful than it could be: > > =# CREATE TABLE qqq (a int); > =# CREATE UNIQUE INDEX IF NOT EXISTS qqq_a_idx ON qqq(a); > =# ALTER TABLE qqq ALTER COLUMN a TYPE json USING NULL; > ERROR: data type json has no default operator class for access method > "btree" > HINT: You must specify an operator class for the index or define a > default operator class for the data type. > > The attached patch adds a CONTEXT line to index and constraint rebuilds, > e.g: > > CONTEXT: while rebuilding index qqq_a_idx > > Any feedback welcome. > I prefer using DETAIL field for this case. Regards Pavel > > .m > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Mon, Oct 5, 2015 at 6:34 AM, Jeff Janes wrote: > On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila > wrote: >> >> >> If I am not wrong we need 1048576 number of transactions difference >> for each record to make each CLOG access a disk access, so if we >> increment XID counter by 100, then probably every 1th (or multiplier >> of 1) transaction would go for disk access. >> >> The number 1048576 is derived by below calc: >> #define CLOG_XACTS_PER_BYTE 4 >> #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) >> > >> Transaction difference required for each transaction to go for disk >> access: >> CLOG_XACTS_PER_PAGE * num_clog_buffers. >> > > > That guarantees that every xid occupies its own 32-contiguous-pages chunk > of clog. > > But clog pages are not pulled in and out in 32-page chunks, but one page > chunks. So you would only need 32,768 differences to get every real > transaction to live on its own clog page, which means every look up of a > different real transaction would have to do a page replacement. > Agreed, but that doesn't effect the test result with the test done above. > (I think your references to disk access here are misleading. Isn't the > issue here the contention on the lock that controls the page replacement, > not the actual IO?) > > The point is that if there is no I/O needed, then all the read-access for transaction status will just use Shared locks, however if there is an I/O, then it would need an Exclusive lock. > I've attached a patch that allows you set the guc "JJ_xid",which makes it > burn the given number of xids every time one new one is asked for. (The > patch introduces lots of other stuff as well, but I didn't feel like > ripping the irrelevant parts out--if you don't set any of the other gucs it > introduces from their defaults, they shouldn't cause you trouble.) I think > there are other tools around that do the same thing, but this is the one I > know about. It is easy to drive the system into wrap-around shutdown with > this, so lowering autovacuum_vacuum_cost_delay is a good idea. > > Actually I haven't attached it, because then the commitfest app will list > it as the patch needing review, instead I've put it here > https://drive.google.com/file/d/0Bzqrh1SO9FcERV9EUThtT3pacmM/view?usp=sharing > > Thanks, I think probably this could also be used for testing. > I think reducing to every 100th access for transaction status as disk >> access >> is sufficient to prove that there is no regression with the patch for the >> screnario >> asked by Andres or do you think it is not? >> >> Now another possibility here could be that we try by commenting out fsync >> in CLOG path to see how much it impact the performance of this test and >> then for pgbench test. I am not sure there will be any impact because >> even >> every 100th transaction goes to disk access that is still less as compare >> WAL fsync which we have to perform for each transaction. >> > > You mentioned that your clog is not on ssd, but surely at this scale of > hardware, the hdd the clog is on has a bbu in front of it, no? > > Yes. > But I thought Andres' concern was not about fsync, but about the fact that > the SLRU does linear scans (repeatedly) of the buffers while holding the > control lock? At some point, scanning more and more buffers under the lock > is going to cause more contention than scanning fewer buffers and just > evicting a page will. > > Yes, at some point, that could matter, but I could not see the impact at 64 or 128 number of Clog buffers. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Confusing remark about UPSERT in fdwhandler.sgml
On 2015/10/03 5:57, Robert Haas wrote: On Fri, Oct 2, 2015 at 4:04 AM, Peter Geoghegan wrote: On Fri, Oct 2, 2015 at 1:00 AM, Etsuro Fujita wrote: ISTM that the sentence "as remote constraints are not locally known" is somewhat confusing, because check constrains on remote tables can be defined locally in 9.5. How about "unique constraints or exclusion constraints on remote tables are not locally known"? Attached is a patch for that. Makes sense to me. Me, too. Committed. Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila wrote: > On Fri, Sep 11, 2015 at 9:21 PM, Robert Haas > wrote: > > > > On Fri, Sep 11, 2015 at 10:31 AM, Amit Kapila > wrote: > > > > Could you perhaps try to create a testcase where xids are accessed > that > > > > are so far apart on average that they're unlikely to be in memory? > And > > > > then test that across a number of client counts? > > > > > > > > > > Now about the test, create a table with large number of rows (say > 11617457, > > > I have tried to create larger, but it was taking too much time (more > than a day)) > > > and have each row with different transaction id. Now each transaction > should > > > update rows that are at least 1048576 (number of transactions whose > status can > > > be held in 32 CLog buffers) distance apart, that way ideally for each > update it will > > > try to access Clog page that is not in-memory, however as the value to > update > > > is getting selected randomly and that leads to every 100th access as > disk access. > > > > What about just running a regular pgbench test, but hacking the > > XID-assignment code so that we increment the XID counter by 100 each > > time instead of 1? > > > > If I am not wrong we need 1048576 number of transactions difference > for each record to make each CLOG access a disk access, so if we > increment XID counter by 100, then probably every 1th (or multiplier > of 1) transaction would go for disk access. > > The number 1048576 is derived by below calc: > #define CLOG_XACTS_PER_BYTE 4 > #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) > > Transaction difference required for each transaction to go for disk access: > CLOG_XACTS_PER_PAGE * num_clog_buffers. > That guarantees that every xid occupies its own 32-contiguous-pages chunk of clog. But clog pages are not pulled in and out in 32-page chunks, but one page chunks. So you would only need 32,768 differences to get every real transaction to live on its own clog page, which means every look up of a different real transaction would have to do a page replacement. (I think your references to disk access here are misleading. Isn't the issue here the contention on the lock that controls the page replacement, not the actual IO?) I've attached a patch that allows you set the guc "JJ_xid",which makes it burn the given number of xids every time one new one is asked for. (The patch introduces lots of other stuff as well, but I didn't feel like ripping the irrelevant parts out--if you don't set any of the other gucs it introduces from their defaults, they shouldn't cause you trouble.) I think there are other tools around that do the same thing, but this is the one I know about. It is easy to drive the system into wrap-around shutdown with this, so lowering autovacuum_vacuum_cost_delay is a good idea. Actually I haven't attached it, because then the commitfest app will list it as the patch needing review, instead I've put it here https://drive.google.com/file/d/0Bzqrh1SO9FcERV9EUThtT3pacmM/view?usp=sharing I think reducing to every 100th access for transaction status as disk access > is sufficient to prove that there is no regression with the patch for the > screnario > asked by Andres or do you think it is not? > > Now another possibility here could be that we try by commenting out fsync > in CLOG path to see how much it impact the performance of this test and > then for pgbench test. I am not sure there will be any impact because even > every 100th transaction goes to disk access that is still less as compare > WAL fsync which we have to perform for each transaction. > You mentioned that your clog is not on ssd, but surely at this scale of hardware, the hdd the clog is on has a bbu in front of it, no? But I thought Andres' concern was not about fsync, but about the fact that the SLRU does linear scans (repeatedly) of the buffers while holding the control lock? At some point, scanning more and more buffers under the lock is going to cause more contention than scanning fewer buffers and just evicting a page will. Cheers, Jeff
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Wed, Sep 23, 2015 at 1:41 PM, Jim Nasby wrote: > max was set to 1. I don't know about average query text size, but the > command that was causing the error was a very large number of individual > INSERT ... VALUES statements all in one command. > > The machine had plenty of free memory and no ulimit, so I don't see how this > could have been anything but MaxAllocSize, unless there's some other failure > mode in malloc I don't know about. The patch that Tom committed does __nothing__ to actually address *how* you were able to get into this position in the first place. Sure, now you might be able to recover if you're not affected by the MaxAllocSize limitation, but: What did it take in the first place to need memory > MaxAllocSize to do a garbage collection? Whatever it was that allowed things to get that bad before a garbage collection was attempted is not even considered in the patch committed. The patch just committed is, in short, nothing more than a band-aid, and may do more harm than good due to allocating huge amounts of memory while a very contended exclusive LWLock is held. You say you have plenty of memory and no ulimit, and I believe that you're right that there was never a conventional OOM where malloc() returns NULL. If we conservatively assume that you were only barely over the MaxAllocSize limitation before a garbage collection was first attempted, then the average query text length would have to be about 50 kilobytes, since only then 20,000 texts (pg_stat_statements.max * 2) would put us over. Does anyone actually think that the *average* SQL query on Jim's client's application was in excess of 50% the size of the ~3,000 line file pg_stat_statements.c? It beggars belief. I'm annoyed and disappointed that the patch committed does not even begin to address the underlying problem -- it just adds an escape hatch, and fixes another theoretical issue that no one was affected by. Honestly, next time I won't bother. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CustomScan support on readfuncs.c
> -Original Message- > From: Robert Haas [mailto:robertmh...@gmail.com] > Sent: Saturday, October 03, 2015 5:44 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Amit Kapila; Andres Freund; pgsql-hackers > Subject: ##freemail## Re: [HACKERS] CustomScan support on readfuncs.c > > On Tue, Sep 29, 2015 at 6:19 PM, Kouhei Kaigai wrote: > >> On Mon, Sep 28, 2015 at 8:31 PM, Kouhei Kaigai > >> wrote: > >> >> Instead of doing this: > >> >> > >> >> +/* Dump library and symbol name instead of raw pointer */ > >> >> appendStringInfoString(str, " :methods "); > >> >> -_outToken(str, node->methods->CustomName); > >> >> +_outToken(str, node->methods->methods_library_name); > >> >> +appendStringInfoChar(str, ' '); > >> >> +_outToken(str, node->methods->methods_symbol_name); > >> >> > >> >> Suppose we just make library_name and symbol_name fields in the node > >> >> itself, that are dumped and loaded like any others. > >> >> > >> >> Would that be better? > >> >> > >> > I have no preference here. > >> > > >> > Even if we dump library_name/symbol_name as if field in CustomScan, > >> > not CustomScanMethods, in a similar way, we cannot use WRITE_STRING_FIELD > >> > here, because its 'fldname' assumes these members are direct field of > >> > CustomScan. > >> > > >> > /* Write a character-string (possibly NULL) field */ > >> > #define WRITE_STRING_FIELD(fldname) \ > >> > (appendStringInfo(str, " :" CppAsString(fldname) " "), \ > >> >_outToken(str, node->fldname)) > >> > >> Well that's exactly what I was suggesting: making them a direct field > >> of CustomScan. > >> > > Let me confirm. Are you suggesting to have library_name/symbol_name > > in CustomScan, not CustomScanMethods? > > I prefer these fields are in CustomScanMethods because we don't need > > to setup them again once PG_init set up these symbols. CustomScan has > > to be created every time when it is chosen by planner. > > True. But that doesn't cost much. I'm not sure it's better, so if > you don't like it, don't worry about it. > Yep, I like library_name and symbol_name are in CustomScanMethods because the table of callbacks and its identifiers are not separable > >> > One other question I have. Do we have a portable way to lookup > >> > a pair of library and symbol by address? > >> > Glibc has dladdr() functions that returns these information, > >> > however, manpage warned it is not defined by POSIX. > >> > If we would be able to have any portable way, it may make the > >> > interface simpler. > >> > >> Yes: load_external_function. > >> > > It looks up an address by a pair of library and symbol name > > What I'm looking for is a portable function that looks up a pair > > of library and symbol name by an address, like dladdr() of glibc. > > I don't know whether other *nix or windows have these infrastructure. > > No, that doesn't exist, and the chances of us trying add that > infrastructure for this feature are nil. > OK, probably, it is the only way to expect extension put correct values on the pair of library and symbol names. I try to check whether the current patch workable with this direction using my extension. Please wait for a few days. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] debbugs
Andrew, * Andrew Dunstan (and...@dunslane.net) wrote: > Could someone please point me at the documentation that says how to > stand up and administer an instance of debbugs? If it doesn't exist > I would say that is a very heavy mark against it. If it does, it's > well enough hidden that a quick use of Google doesn't make it stand > out, which doesn't fill me with confidence either https://bugs.debian.org/debbugs-source/ README.md Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] No Issue Tracker - Say it Ain't So!]
On Sun, Oct 04, 2015 at 04:30:49PM -0700, Josh Berkus wrote: > That would be the key part, wouldn't it? Nice that you have [code to > store and parse email messages]. Yeah. It actually made most of the work pretty easy. It's available with a bunch of other code at https://pd.if.org/git/ if anyone wants it. I did find a bug in my header processing though, so I'll need to commit that fix. > We'd also want a way to link a bug fix to a commit, and probably a way > to give the bug a list of searchable keywords (and add to that list). I've been thinking of hooking it up to the fti machinery and providing a search box. I've never really used fti before, so this might be a good opportunity to learn it for real. > > it probably makes sense to just close up front bugs that are marked > > against unsupported pg versions, or haven't had any activity for too > > long, perhaps two years. > I'm reluctant to close all of those unexamined, since part of the > purpose of this is to find bugs which were never fixed. Probably we > should organize a posse to comb trhough all of the old bugs and > hand-close them. I'm doing some of that as I poke at the bugs pages. Perhaps it would make sense to have a closed status of 'stale' or the like (perhaps that's what you meant by 'timed out') which could be used to get bugs out of the main list but still be marked as 'not human reviewed'. These could then be reviewed by the posse. > Yeah, fixing this [email's without a bug id] would probably be tied to > the possible change to mailman. Unless someone already has a way to > get majordomo to append a bug ID. How are bug ids assigned now? From the evidence, I assume there is a sequence in a database that the web submission form queries to format a resulting email to the bugs mailing list. Do the mailing lists live on the same server? If so, perhaps it would be easy to assign a new bug id to a new thread on -bugs. But perhaps that's too aggressive in creating bugs. > > 5: How can we use email to update the status of a bug? > > > > I suggest using email headers to do this. 'X-PGBug-Fixed: > > ' and the like. I assume here that everyone who might > > want to do such a thing uses an MUA that would allow this, and they > > know how. > > I guess that depends on who we expect to use this, at least for > closing stuff. I could certainly support more than one mechanism. A web interface for those who would prefer such and emails would seem to be the basic requirements. > > 6: Does there need to be any security on updating the status? > > > > Probably not. I don't think it's the sort of thing that would > > attract malicious adjustments. If I'm wrong, I'd need to rethink > > this. I realize I'm making security an afterthought, which makes my > > teeth itch, but I think layers of security would make it much less > > likely to be actually adopted. > > I think there needs to be some kind of administrative access which > allows, for example, an issue to be closed so that it can't be > reopened. I guess technically we have that now since I'm the only one who can close or open a bug in the db I've set up :) Seriously though, I think it probably makes the most sense to tie the system in with the existing pg community accounts if it goes that far. > Anyway, I'm not convinced we want to reinvent this particular wheel, but > if we do, you've done a yeoman's job. Thank you. (Assuming I've interpreted the phrase correctly, I'm not familiar with it, and a web search was only semi-enlightening). -- nw -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] debbugs
Could someone please point me at the documentation that says how to stand up and administer an instance of debbugs? If it doesn't exist I would say that is a very heavy mark against it. If it does, it's well enough hidden that a quick use of Google doesn't make it stand out, which doesn't fill me with confidence either cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No Issue Tracker - Say it Ain't So!]
On 10/04/2015 03:42 PM, Nathan Wagner wrote: > I downloaded the archives for pgsql-bugs, and fed them into a database. This > part was easy, since I have already written a pg backed usenet server and had > the code hand for storing and parsing out bits of rfc 2822 messages. That would be the key part, wouldn't it? Nice that you have that. So this would be the other option if adopting debbugs doesn't work out. I think it's likely that we'll end up recreating most of debbugs in the process (or bugzilla or something else) but whatever. As long as we have some kind of bug tracker, I'm happy. > It's dirt simple. If the system sees a message with 'Bug #(\d+)' in the > subject line, it creates an entry in a bugs table with that bug number (if > needed), and then marks the message as belonging to that bug. If there seems > to be metadata about the bug in the format of the (unquoted) > > Bug reference: > Logged by: > Email address: > PostgreSQL version: > Operating system: > Description: > Details: > > it pulls that out and puts it in the bugs table. There's also an "open" > boolean in the table, defaulting to true. > > The results can be found at https://granicus.if.org/pgbugs/ > > Ok. So now we have a bug tracker, but... > > Some open questions that I don't think have really been addressed, with my > commentary interspersed: > > 1: Can a bug be more than "open" or "closed"? > > I think yes. At least we probably want to know why a bug is closed. Is it > not > a bug at all, not our bug, a duplicate submission, a duplicate of another bug, > something we won't fix for some reason (e.g. a bug against version 7) We'd want the usual statuses: * fixed * duplicate * unreproduceable * timed out * not a bug * won't fix * reopened We'd also want a way to link a bug fix to a commit, and probably a way to give the bug a list of searchable keywords (and add to that list). > 2: Who can declare a bug closed. > > Ugh. I'm going to close some of them if it seems obvious to me that they > should be closed. But what if it's not obvious? I could probably maintain it > to some extent, but I don't know how much time that would actually take. > > Related to the next point, it probably makes sense to just close up front > bugs that are marked against unsupported pg versions, or haven't had > any activity for too long, perhaps two years. Just closing bugs with no > mailing list activity for two years closes 5280 of 6376 bugs. I'm reluctant to close all of those unexamined, since part of the purpose of this is to find bugs which were never fixed. Probably we should organize a posse to comb trhough all of the old bugs and hand-close them. > 3: How far back should I actually import data from the bugs list? > > I have imported each archived month from December of 1998. It looks like the > bug sequence was started at 1000 in December of 2003. Emails with no bug id > in > the subject line don't get associated with any bug, they're in the DB bug not > really findable. > > 4: What should I do with emails that don't reference a bug id but seem to be > talking about a bug? > > I suggest we do nothing with them as far as the bug tracker is concerned. If > people want to mark their message as pertaining to a bug, they can put that in > the subject line. However, I don't think a bug id can be assigned via email, > that is, I think you have to use a web form to create a bug report with a bug > id. Presumably that could change if whoever runs the bug counter wants it to. Yeah, fixing this would probably be tied to the possible change to mailman. Unless someone already has a way to get majordomo to append a bug ID. > 5: How can we use email to update the status of a bug? > > I suggest using email headers to do this. 'X-PGBug-Fixed: ' and the > like. I assume here that everyone who might want to do such a thing uses an > MUA that would allow this, and they know how. I guess that depends on who we expect to use this, at least for closing stuff. > 6: Does there need to be any security on updating the status? > > Probably not. I don't think it's the sort of thing that would attract > malicious adjustments. If I'm wrong, I'd need to rethink this. I realize I'm > making security an afterthought, which makes my teeth itch, but I think layers > of security would make it much less likely to be actually adopted. I think there needs to be some kind of administrative access which allows, for example, an issue to be closed so that it can't be reopened. Anyway, I'm not convinced we want to reinvent this particular wheel, but if we do, you've done a yeoman's job. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Sun, Oct 4, 2015 at 3:16 PM, Tom Lane wrote: > Peter Geoghegan writes: >> To be clear: I wasn't sure why you though I falsely count entries with >> dropped texts within entry_dealloc(). > > In the existing^H^H^Hprevious code, dropped-text entries would essentially > act as length-zero summands in the average calculation, whereas I think > we agree that they ought to be ignored; otherwise they decrease the > computed mean and thereby increase the probability of (useless) GC cycles. > In the worst case where the hashtable is mostly dropped-text entries, > which would for instance be the prevailing situation shortly after a GC > failure, we'd be calculating ridiculously small mean values and that'd > prompt extra GC cycles no? Yes, but my patch changed that, too. I suggested that first. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Sun, Oct 4, 2015 at 3:10 PM, Tom Lane wrote: >> That seems perfectly reasonable, yes. Should I leave that to you? > > After a closer look I decided that wasn't reasonable at all. Discounting > sticky texts would then mean that after a GC cycle, we might still think > the query texts file is bloated and issue another GC request, which of > course would not shrink the file, so that we'd be doing GC cycles every > time we added a new query. That seems unlikely to me. Firstly, there is a generic check of 512 bytes per entry within need_gc_qtexts() -- we never GC if that generic limit is not exceeded, and that's far from tiny. Secondly, how long do you think those sticky entries will stay around in the hastable to continue to cause trouble, and dominate over regular entries? There'd have to be constant evictions/cache pressure for this situation to occur, because the threshold within need_gc_qtexts() is based on pg_stat_statements.max, and yet many evictions aggressively evict sticky entries very quickly. Seems like a very unusual workload to me. I think that you're overestimating the cost of discounting sticky entries, which are usually very much in the minority (at least when it matters, with cache pressure), and underestimating the cost of continuing to weigh sticky entries' contribution to mean query text. As I said, my proposal to not have sticky entries contribute to overall mean query text length is based on a problem report involving a continually failing data integration process. That in particular is what I hope to stop having a negative impact on mean query length -- a unique queryid makes for an entry that is bound to remain useless forever (as opposed to just failing the first few times). With the larger limit of MaxAllocHugeSize, I worry about swapping with the exclusive lock held. The fact that there is a big disparity between mean query length for sticky and non-sticky entries is weird. It was seen to happen in the wild only because the sticky entries that clogged things up were not really distinct to a human -- only to the fingerprinting/jumbling code, more or less by accident, which in a practical sense caused a distortion. That's what I'm targeting. I have a hard time imagining any harm from discounting sticky entries with a realistic case. > The mean query len recorded by gc_qtexts() > *has to* match the mean length of what it actually put in the file, not > the mean length of what we might wish it would put in the file. Why? Why not simply care about whether or not the file was unreasonably large relative to available, useful query statistics? I think that mean_query_len isn't something that swings all that much with realistic workloads and 5,000 representative entries -- it certainly should not swing wildly. The problem to an extent was that that accidentally stopped being true. > By the same token, I'm back to believing that it's fundamentally bogus for > entry_dealloc() to compute mean_query_len that way. The most likely > result of that is useless GC cycles. The only thing that will actually > free memory when you've got a lot of dead sticky entries is getting rid of > the sticky hashtable entries, and the only way to do that is to wait for > entry_dealloc() to get rid of 'em. You were unenthused about making that > removal more aggressive, which is fine, but you can't have it both ways. Meanwhile, mean_query_len grows as more "distinct" entries are created, pushing out their garbage collection further and further. Especially when there is a big flood of odd queries that stay sticky. Once again, I'm having a really hard time imagining a minority of current, non-sticky hashtable entries dominating a majority of current, sticky hashtable entries come garbage collection time. Garbage collection is linked to creation of entries, which is also linked to eviction, which aggressively evicts sticky entries in this thrashing scenario that you describe. > It does strike me that when we do get rid of the sticky entries, cleanup > of the texts file might lag a bit longer than it needs to because > mean_query_len is computed before not after deleting whatever entries > we're going to delete. On average, that shouldn't matter ... but if we > are tossing a bunch of dead sticky entries, maybe they would have a higher > mean length than the rest? Yes, that is something that was observed to happen in the problem case I looked into. You know my solution already. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Andrew Dunstan writes: > Sorry, I'm a bit late to this party. Does what you have committed mean > people are less likely to see "Out of Memory" coming from > pg_stat_statements? If not, what can be done about them short of a > restart? And what bad effects follow from an event generating them? The main thing we've done that will alleviate that is increase the size of query text file that the garbage-collection routine can cope with from MaxAllocSize (1GB) to MaxAllocHugeSize (at least 2GB, lots more on 64bit machines, though on 32-bit you probably can't get to 2GB anyway ...). Also, what will now happen if you do get an out-of-memory is that the code will discard stored query texts and truncate the file, so that the problem doesn't recur (at least not till you build up a new set of stored query texts). At this point you still have statistics, but they can only be identified by query ID since the text has been forgotten. I'm not sure how useful that situation really is ... > The docs seem to be quite silent on these points. The docs probably ought to describe this situation and recommend reducing pg_stat_statements.max if you want to preserve query texts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On 10/04/2015 06:16 PM, Tom Lane wrote: Peter Geoghegan writes: To be clear: I wasn't sure why you though I falsely count entries with dropped texts within entry_dealloc(). In the existing^H^H^Hprevious code, dropped-text entries would essentially act as length-zero summands in the average calculation, whereas I think we agree that they ought to be ignored; otherwise they decrease the computed mean and thereby increase the probability of (useless) GC cycles. In the worst case where the hashtable is mostly dropped-text entries, which would for instance be the prevailing situation shortly after a GC failure, we'd be calculating ridiculously small mean values and that'd prompt extra GC cycles no? Sorry, I'm a bit late to this party. Does what you have committed mean people are less likely to see "Out of Memory" coming from pg_stat_statements? If not, what can be done about them short of a restart? And what bad effects follow from an event generating them? The docs seem to be quite silent on these points. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No Issue Tracker - Say it Ain`t So!]
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > Comments are welcome, and no, I don't really expect that this will be what > gets > adopted, mainly I wanted to show that we can probably just build something > rather effective off our existing infrastructure +1, good job. > The bugs have 3.5 messages each on average, with 2 being the most common > number, and 113 at the most, for bug 12990. 1284 bugs have only one message > associated with them. For anyone who is dying to know, as I was, what the winning bug report was: "Missing pg_multixact/members files (appears to have wrapped, then truncated)" http://www.postgresql.org/message-id/flat/20150406192130.2573.22...@wrigleys.postgresql.org#20150406192130.2573.22...@wrigleys.postgresql.org or: http://goo.gl/4lKYOC - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201510041854 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAlYRriIACgkQvJuQZxSWSsgJkwCgsROux3esaDxHbitNhHs17Thk rKIAoNMD6NnKRAvguuvxkg4hiJOfPDH6 =5kJJ -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] No Issue Tracker - Say it Ain't So!]
I don't have the original message for this thread, so I arbitrarily picked a message to reply to. Since what has been asked for is a bug *tracker*, and we already have a bugs mailing list, I put together something. I downloaded the archives for pgsql-bugs, and fed them into a database. This part was easy, since I have already written a pg backed usenet server and had the code hand for storing and parsing out bits of rfc 2822 messages. It's dirt simple. If the system sees a message with 'Bug #(\d+)' in the subject line, it creates an entry in a bugs table with that bug number (if needed), and then marks the message as belonging to that bug. If there seems to be metadata about the bug in the format of the (unquoted) Bug reference: Logged by: Email address: PostgreSQL version: Operating system: Description: Details: it pulls that out and puts it in the bugs table. There's also an "open" boolean in the table, defaulting to true. The results can be found at https://granicus.if.org/pgbugs/ Ok. So now we have a bug tracker, but... Some open questions that I don't think have really been addressed, with my commentary interspersed: 1: Can a bug be more than "open" or "closed"? I think yes. At least we probably want to know why a bug is closed. Is it not a bug at all, not our bug, a duplicate submission, a duplicate of another bug, something we won't fix for some reason (e.g. a bug against version 7) 2: Who can declare a bug closed. Ugh. I'm going to close some of them if it seems obvious to me that they should be closed. But what if it's not obvious? I could probably maintain it to some extent, but I don't know how much time that would actually take. Related to the next point, it probably makes sense to just close up front bugs that are marked against unsupported pg versions, or haven't had any activity for too long, perhaps two years. Just closing bugs with no mailing list activity for two years closes 5280 of 6376 bugs. 3: How far back should I actually import data from the bugs list? I have imported each archived month from December of 1998. It looks like the bug sequence was started at 1000 in December of 2003. Emails with no bug id in the subject line don't get associated with any bug, they're in the DB bug not really findable. 4: What should I do with emails that don't reference a bug id but seem to be talking about a bug? I suggest we do nothing with them as far as the bug tracker is concerned. If people want to mark their message as pertaining to a bug, they can put that in the subject line. However, I don't think a bug id can be assigned via email, that is, I think you have to use a web form to create a bug report with a bug id. Presumably that could change if whoever runs the bug counter wants it to. 5: How can we use email to update the status of a bug? I suggest using email headers to do this. 'X-PGBug-Fixed: ' and the like. I assume here that everyone who might want to do such a thing uses an MUA that would allow this, and they know how. 6: Does there need to be any security on updating the status? Probably not. I don't think it's the sort of thing that would attract malicious adjustments. If I'm wrong, I'd need to rethink this. I realize I'm making security an afterthought, which makes my teeth itch, but I think layers of security would make it much less likely to be actually adopted. Just to be clear, this is both a work in progress and a proof of concept. It's slow, it's ugly. I haven't created any indexes, written any css or javascript, or implemented any caching. I may work on that before you read this though. Comments are welcome, and no, I don't really expect that this will be what gets adopted, mainly I wanted to show that we can probably just build something rather effective off our existing infrastructure, and I had Saturday free (as of this writing, I've got maybe 5 hours into it total, albeit with lots of code re-use from other projects). There are some obvious todo items, filtering and searching being the most salient. Some data import issues: March 10, 2003, bad Date header, looked like junk anyway, so I deleted it. Time zone offsets of --0400 and -0500 (at least), I took these as being -0400 (or whathaveyou). Date header of Sat, 31 May 2008 12:12:18 +1930, judging by the name on the email, this is probably posted from Venezuela, which switched to time one -0430 in 2007, which could also be thought of as +1930 if you ignore the implied date change. And, by way of some statistics, since I've got the archives in a database: Emails: 43870 Bugs: 6376 Distinct 'From' headers: 10643 The bugs have 3.5 messages each on average, with 2 being the most common number, and 113 at the most, for bug 12990. 1284 bugs have only one message associated with them. -- nw -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Peter Geoghegan writes: > To be clear: I wasn't sure why you though I falsely count entries with > dropped texts within entry_dealloc(). In the existing^H^H^Hprevious code, dropped-text entries would essentially act as length-zero summands in the average calculation, whereas I think we agree that they ought to be ignored; otherwise they decrease the computed mean and thereby increase the probability of (useless) GC cycles. In the worst case where the hashtable is mostly dropped-text entries, which would for instance be the prevailing situation shortly after a GC failure, we'd be calculating ridiculously small mean values and that'd prompt extra GC cycles no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Peter Geoghegan writes: > On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane wrote: >> Hm. The problem I've got with this is that then mean_query_len means >> something significantly different after entry_dealloc than it does >> after gc_texts. >> >> I'd be okay with changing *both* of those functions to ignore sticky >> entries in the calculation, if that seems reasonable to you. > That seems perfectly reasonable, yes. Should I leave that to you? After a closer look I decided that wasn't reasonable at all. Discounting sticky texts would then mean that after a GC cycle, we might still think the query texts file is bloated and issue another GC request, which of course would not shrink the file, so that we'd be doing GC cycles every time we added a new query. The mean query len recorded by gc_qtexts() *has to* match the mean length of what it actually put in the file, not the mean length of what we might wish it would put in the file. By the same token, I'm back to believing that it's fundamentally bogus for entry_dealloc() to compute mean_query_len that way. The most likely result of that is useless GC cycles. The only thing that will actually free memory when you've got a lot of dead sticky entries is getting rid of the sticky hashtable entries, and the only way to do that is to wait for entry_dealloc() to get rid of 'em. You were unenthused about making that removal more aggressive, which is fine, but you can't have it both ways. It does strike me that when we do get rid of the sticky entries, cleanup of the texts file might lag a bit longer than it needs to because mean_query_len is computed before not after deleting whatever entries we're going to delete. On average, that shouldn't matter ... but if we are tossing a bunch of dead sticky entries, maybe they would have a higher mean length than the rest? Not sure about it. I put a comment about this into entry_dealloc() for the moment, but we can revisit whether it is worth adding code/cycles to get a more up-to-date mean length. Anyway, I've committed the aspects of this that we're agreed on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ALTER TABLE behind-the-scenes effects' CONTEXT
Hi, In the past I've found the error message in cases such as this somewhat less helpful than it could be: =# CREATE TABLE qqq (a int); =# CREATE UNIQUE INDEX IF NOT EXISTS qqq_a_idx ON qqq(a); =# ALTER TABLE qqq ALTER COLUMN a TYPE json USING NULL; ERROR: data type json has no default operator class for access method "btree" HINT: You must specify an operator class for the index or define a default operator class for the data type. The attached patch adds a CONTEXT line to index and constraint rebuilds, e.g: CONTEXT: while rebuilding index qqq_a_idx Any feedback welcome. .m *** a/src/backend/commands/tablecmds.c --- b/src/backend/commands/tablecmds.c *** *** 82,87 --- 82,88 #include "storage/smgr.h" #include "utils/acl.h" #include "utils/builtins.h" + #include "utils/elog.h" #include "utils/fmgroids.h" #include "utils/inval.h" #include "utils/lsyscache.h" *** *** 309,314 static void ATController(AlterTableStmt *parsetree, --- 310,316 Relation rel, List *cmds, bool recurse, LOCKMODE lockmode); static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, bool recurse, bool recursing, LOCKMODE lockmode); + static void ATRewriteSubcommandErrorCallback(void *arg); static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode); static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel, AlterTableCmd *cmd, LOCKMODE lockmode); *** *** 3373,3378 ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, --- 3375,3399 tab->subcmds[pass] = lappend(tab->subcmds[pass], cmd); } + static void + ATRewriteSubcommandErrorCallback(void *arg) + { + AlterTableCmd *subcmd = (AlterTableCmd *) arg; + + switch (subcmd->subtype) + { + case AT_ReAddIndex: + errcontext("while rebuilding index %s", subcmd->name); + break; + case AT_ReAddConstraint: + errcontext("while rebuilding constraint %s", subcmd->name); + break; + default: + /* keep compiler quiet */ + (void) 0; + } + } + /* * ATRewriteCatalogs * *** *** 3402,3407 ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode) --- 3423,3429 List *subcmds = tab->subcmds[pass]; Relationrel; ListCell *lcmd; + ErrorContextCallback errcallback; if (subcmds == NIL) continue; *** *** 3411,3418 ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode) */ rel = relation_open(tab->relid, NoLock); foreach(lcmd, subcmds) ! ATExecCmd(wqueue, tab, rel, (AlterTableCmd *) lfirst(lcmd), lockmode); /* * After the ALTER TYPE pass, do cleanup work (this is not done in --- 3433,3451 */ rel = relation_open(tab->relid, NoLock); + errcallback.callback = ATRewriteSubcommandErrorCallback; + errcallback.previous = error_context_stack; + error_context_stack = &errcallback; + foreach(lcmd, subcmds) ! { ! AlterTableCmd *subcmd = (AlterTableCmd *) lfirst(lcmd); ! ! errcallback.arg = subcmd; ! ATExecCmd(wqueue, tab, rel, subcmd, lockmode); ! } ! ! error_context_stack = errcallback.previous; /* * After the ALTER TYPE pass, do cleanup work (this is not done in *** *** 8682,8687 ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, --- 8715,8721 newcmd = makeNode(AlterTableCmd); newcmd->subtype = AT_ReAddIndex; + newcmd->name = stmt->idxname; newcmd->def = (Node *) stmt; tab->subcmds[AT_PASS_OLD_INDEX] = lappend(tab->subcmds[AT_PASS_OLD_INDEX], newcmd); *** *** 8712,8717 ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd, --- 8746,8752 RelationRelationId, 0); cmd->subtype = AT_ReAddIndex; + cmd->name = indstmt->idxname; tab->subcmds[AT_PASS_OLD_INDEX
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Sun, Oct 4, 2015 at 1:12 PM, Peter Geoghegan wrote: > On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane wrote: >> Ah, right, sorry. I meant to make its result match what gc_texts would >> get, by not falsely counting entries with dropped texts. That's not >> what you have in your patch but it seems like an easy enough fix. > > I'm trying to make mean_query_len representative of *useful* entry > query length. I guess I don't have that within gc_texts in my patch, > but I do have it within entry_dealloc (up to and including considering > dropped texts), which FWIW is far more important. To be clear: I wasn't sure why you though I falsely count entries with dropped texts within entry_dealloc(). I suppose my sense was that dropped texts ought to not make garbage collection occur too frequently, which could also be a problem. Garbage collection ought to occur when the size of the query text file becomes excessive relative to useful entries. I was worried about the thrashing risk from dropped text entries. Maybe we could, as an alternative, not forget the original size of dropped query texts, relying only on their offset to indicate the text is invalid. Dropped query texts would then not be special in that sense, which seems like a good thing all around. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Peter Geoghegan writes: > On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane wrote: >> Hm. The problem I've got with this is that then mean_query_len means >> something significantly different after entry_dealloc than it does >> after gc_texts. >> >> I'd be okay with changing *both* of those functions to ignore sticky >> entries in the calculation, if that seems reasonable to you. > That seems perfectly reasonable, yes. Should I leave that to you? Sure, I can take it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Sun, Oct 4, 2015 at 1:01 PM, Tom Lane wrote: > Ah, right, sorry. I meant to make its result match what gc_texts would > get, by not falsely counting entries with dropped texts. That's not > what you have in your patch but it seems like an easy enough fix. I'm trying to make mean_query_len representative of *useful* entry query length. I guess I don't have that within gc_texts in my patch, but I do have it within entry_dealloc (up to and including considering dropped texts), which FWIW is far more important. >> I'd be quite happy if you did everything listed, and also left the >> extra discrimination against sticky entries within entry_dealloc in -- >> consider what happens when a huge malloc() ends up swapping with an >> exclusive lock held, and consider that repeated, failed data >> integration transactions are implicated in this in a big way when a >> problem appears in the wild. A big part of the problem here was that >> garbage collection did not run often enough. > > Hm. The problem I've got with this is that then mean_query_len means > something significantly different after entry_dealloc than it does > after gc_texts. > > I'd be okay with changing *both* of those functions to ignore sticky > entries in the calculation, if that seems reasonable to you. That seems perfectly reasonable, yes. Should I leave that to you? >> In other words, I'd be fine with *not* doing the query size filter >> thing for now, since that is something that seems like an extra >> defense and not core to the problem. I was kind of ambivalent about >> doing that part myself, actually. > > Agreed on that part. We're in full agreement on what needs to happen for the next point release, then. Excellent. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential GIN vacuum bug
On Thu, Sep 3, 2015 at 10:42 AM, Jeff Janes wrote: > On Mon, Aug 31, 2015 at 12:10 AM, Jeff Janes wrote: > >> On Sun, Aug 30, 2015 at 3:57 PM, Tom Lane wrote: >> >>> Jeff Janes writes: >>> > On Sun, Aug 30, 2015 at 11:11 AM, Tom Lane wrote: >>> >> > But we would still have to deal with the >>> > fact that unconditional acquire attempt by the backends will cause a >>> vacuum >>> > to cancel itself, which is undesirable. >>> >>> Good point. >>> >>> > If we define a new namespace for >>> > this lock (like the relation extension lock has its own namespace) then >>> > perhaps the cancellation code could be made to not cancel on that >>> > condition. But that too seems like a lot of work to backpatch. >>> >>> We could possibly teach the autocancel logic to distinguish this lock >>> type >>> from others without using a new namespace. That seems a bit klugy, but >>> maybe better than adding a new namespace. (For example, there are >>> probably only a couple of modes in which we take page-level locks at >>> present. Choosing a currently unused, but self-exclusive, mode for >>> taking >>> such a lock might serve.) >>> >> >> Like the attached? (The conditional variant for user backends was >> unceremoniously yanked out.) >> > > A problem here is that now we have the user backends waiting on vacuum to > do the clean up, but vacuum is using throttled IO and so taking its sweet > time at it. Under the old code, the user backend could pass up the vacuum > while it was sleeping. > > Maybe we could have the vacuum detect when someone is waiting on it, and > temporarily suspend throttling and just run at full speed. I don't believe > there is any precedence for that, but I do think there are other places > where such a technique could be useful. That is kind of a scary change to > backpatch. > > I am running out of ideas. > Teodor published a patch in another thread: http://www.postgresql.org/message-id/56041b26.2040...@sigaev.ru but I thought it would be best to discuss it here. It is similar to my most recent patch. He removes the parts of the code that anticipates concurrent clean up, and replaces them with asserts, which I was too lazy to do until we have a final design. He uses a different lock mode (ExclusiveLock, rather than ShareUpdateExclusiveLock) when heavy-locking the metapage. It doesn't make a difference, as long as it is self-exclusive and no one else uses it in a way that causes false sharing (which is currently the case--the only other user of PageLocks is the hash index code) He always does conditional locks in regular backends. That means he doesn't have to hack the lmgr to prevent vacuum from canceling itself, the way I did. It also means there is not the "priority inversion" I mention above, where a user backend blocks on vacuum, but vacuum is intentionally throttling itself. On the other hand, using conditional locks for normal backends does mean that the size of the pending list can increase without limit, as there is nothing to throttle the user backends from adding tuples faster than they are cleaned up. Perhaps worse, it can pin down a vacuum worker without limit, as it keeps finding more pages have been added by the time it finished the prior set of pages. I actually do see this on my (not very realistic) testing. I think that for correctness, vacuum only needs to clean the part of the pending list which existed as of the time vacuum started. So as long as it gets that far, it can just be done even if more pages have since been added. I'm not sure the best way implement that, I guess you remember the blkno of the tail page from when you started, and would set a flag once you truncated away a page with that same blkno. That would solve the pinning down a vacuum worker for an unlimited amount of time issue, but would not solve the unlimited growth of the pending list issue. Cheers, Jeff
Re: [HACKERS] check fails on Fedora 23
On 10/04/2015 12:52 PM, Pavel Stehule wrote: Isn't this arguably a Fedora regression? What did they change in F23 to make it fail? I note that F23 is still in Beta. It is working on F22 - so it is looking as regression in some fedora components. can somebody repeat check on FC23? Yes, I have reproduced it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Peter Geoghegan writes: > I'm not clear on what you actually propose to do to "make > entry_dealloc's recomputation of mean_query_len sane", but I think you > are talking about something distinct from what I've proposed Ah, right, sorry. I meant to make its result match what gc_texts would get, by not falsely counting entries with dropped texts. That's not what you have in your patch but it seems like an easy enough fix. > I'd be quite happy if you did everything listed, and also left the > extra discrimination against sticky entries within entry_dealloc in -- > consider what happens when a huge malloc() ends up swapping with an > exclusive lock held, and consider that repeated, failed data > integration transactions are implicated in this in a big way when a > problem appears in the wild. A big part of the problem here was that > garbage collection did not run often enough. Hm. The problem I've got with this is that then mean_query_len means something significantly different after entry_dealloc than it does after gc_texts. I'd be okay with changing *both* of those functions to ignore sticky entries in the calculation, if that seems reasonable to you. > In other words, I'd be fine with *not* doing the query size filter > thing for now, since that is something that seems like an extra > defense and not core to the problem. I was kind of ambivalent about > doing that part myself, actually. Agreed on that part. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
On Sun, Oct 4, 2015 at 9:27 AM, Tom Lane wrote: > Hm. I'm unconvinced by the aspects of this that involve using > mean_query_len as a filter on which texts will be accepted. While that's > not necessarily bad in the abstract, there are way too many implementation > artifacts here that will result in random-seeming decisions about whether > to normalize. There are already plausible race conditions that can make query text normalization not occur. I think that's much more likely in practice to cause a failure to normalize than anything proposed here; I've personally observed such things in the wild a few times already. Also, note that mean_query_len is not used directly there -- I decided on Max(ASSUMED_LENGTH_INIT, mean_query_len) instead. mean_query_len is just for cases with very large query texts. > * mean_query_len only gets recomputed in entry_dealloc(), which is only > run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide > the query texts file is more than 50% bloat. So there could be quite a > long startup transient before the value gets off its initialization > minimum, and I'm suspicious that there might be plausible use-cases where > it never does. So it's not so much "restrict to a multiple of the mean > query len" as "restrict to some number that might once upon a time have > had some relation to the mean query len, or maybe not". ASSUMED_LENGTH_INIT * 5 is a pretty conservative lower bound, I'd say. mean_query_len is only really used for cases where query texts are much longer on average. So in order for that to be a problem, you'd have to have what are, in an absolute sense, very large query texts. I think I noticed no more than a handful of changes in the regression tests, for example. > * One could expect that after changing mean_query_len, the population of > query texts would change character as a result of the filter behavior > changing, so that convergence to stable behavior over the long haul is > not exactly self-evident. FWIW, I think that there is a feedback loop today, and that in problem cases that was what allowed it to get out of hand. > * As you've got it here, entry_dealloc() and gc_qtexts() don't compute > mean_query_len the same way, because only one of them discriminates > against sticky entries. So the value would bounce about rather randomly > based on which one had run last. entry_dealloc() will naturally run far more frequently than gc_qtexts(). That said, it would be better if they matched. > * I'm not exactly convinced that sticky entries should be ignored for > this purpose anyway. I think that data integration transactions that fail repeatedly are strongly implicated here in practice. That's behind the query size filter thing that you may also take issue with, as well as this. > Taking a step back, ISTM the real issue you're fighting here is lots of > orphaned sticky entries, but the patch doesn't do anything directly to fix > that problem. I wonder if we should do something like making > entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries, > or at least those with "large" texts. Sticky entries are (almost by definition) always aggressively removed, and I hesitate to give certain ones a lower usage_count to begin with, which is the only way to directly be more aggressive that might work better. > I think the aspects of this patch that are reasonably uncontroversial are > increasing the allowed malloc attempt size in gc_qtexts, flushing the > query text file on malloc failure, fixing the missing cleanup steps after > a gc failure, and making entry_dealloc's recomputation of mean_query_len > sane (which I'll define for the moment as "the same as gc_qtexts would > get"). Since we're hard against a release deadline, I propose to commit > just those changes, and we can consider the idea of a query size filter > and/or redefining mean_query_len at leisure. I'm not clear on what you actually propose to do to "make entry_dealloc's recomputation of mean_query_len sane", but I think you are talking about something distinct from what I've proposed based on your separate remarks about entry_dealloc and the extra discrimination against sticky entries there (vis-a-vis calculating mean query length). I can't decide exactly what you mean, though: neither entry_dealloc nor gc_qtexts care about orphaned query texts in my patch (or in master). Please clarify. I'd be quite happy if you did everything listed, and also left the extra discrimination against sticky entries within entry_dealloc in -- consider what happens when a huge malloc() ends up swapping with an exclusive lock held, and consider that repeated, failed data integration transactions are implicated in this in a big way when a problem appears in the wild. A big part of the problem here was that garbage collection did not run often enough. In other words, I'd be fine with *not* doing the query size filter thing for now, since that is something that seems like an e
Re: [HACKERS] about fsync in CLOG buffer write
On 2015-10-04 12:14:05 -0700, Jeff Janes wrote: > My (naive) expectation is that no additional locking is needed. > > Once we decide to consult the clog, we already know the transaction is no > longer in progress, so it can't be in-flight to change that clog entry we > care about because it was required to have done that already. Other xids on the same page can still be in progress and those concurrently might need to be written to. > Once we have verified (under existing locking) that the relevant page is > already not in memory, we know it can't be dirty in memory. If someone > pulls it into memory after we observe it to be not there, it doesn't matter > to us as whatever transaction they are about to change can't be the one we > care about. The read of the page from disk from a concurrent process might have been before our write, i.e. containing an unmodified page, but now future writes will overwrite the entry we wrote directly. I think there's a bunch of related issues. Such things will currently prevented by the IO locks in slru.c. > Is there a chance that, if we read a byte from the kernel when someone is > in the process of writing adjacent bytes (or writing the same byte, with > changes only to bits in it which we don't care about), the kernel will > deliver us something which is neither the old value nor the new value, but > some monstrosity? Depends on the granularity of the write/read and the OS IIRC. I don't think it's worth investing time and complexity to bypass SLRU in certain cases. We should rather rewrite the thing completely. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about fsync in CLOG buffer write
On Sat, Sep 12, 2015 at 5:21 PM, Andres Freund wrote: > On September 12, 2015 5:18:28 PM PDT, Jeff Janes > wrote: > >On Wed, Sep 2, 2015 at 5:32 AM, Andres Freund > >wrote: > > > >> On 2015-09-10 19:39:59 +0800, 张广舟(明虚) wrote: > >> > We found there is a fsync call when CLOG buffer > >> > is written out in SlruPhysicalWritePage(). It is often called when > >a > >> backend > >> > needs to check transaction status with SimpleLruReadPage(). > >> > >> That's when there's not enough buffers available some other, and your > >> case dirty, needs to be written out. > >> > > > >Why bother to find a place to store the page in shared memory at all? > >If > >we just want to read it, and it isn't already in shared memory, then > >why > >not just ask the kernel for the specific byte we need? The byte we > >want to > >read can't differ between shared memory and kernel, because it doesn't > >exist in shared memory. > > I doubt that'd help - the next access would be more expensive, and we'd > need to have a more complex locking regime. These pages aren't necessarily > read only at that point. > My (naive) expectation is that no additional locking is needed. Once we decide to consult the clog, we already know the transaction is no longer in progress, so it can't be in-flight to change that clog entry we care about because it was required to have done that already. Once we have verified (under existing locking) that the relevant page is already not in memory, we know it can't be dirty in memory. If someone pulls it into memory after we observe it to be not there, it doesn't matter to us as whatever transaction they are about to change can't be the one we care about. Perhaps someone will want the same page later so that they can write to it and so will have to pull it in. But we have to play the odds, and the odds are that a page already dirty in memory is more likely to be needed to be written to in the near future, than another page which was not already dirty and is only needed with read intent. If we are wrong, all that happens is someone later on has to do the same work that we would have had to do anyway, at no greater cost than we if did it now. If we are right, we avoid an fsync to make room for new page, and then later on avoid someone else having to shove out the page we brought in (or a different one) only to replace it with the same page we just wrote, fsynced, and shoved out. Is there a chance that, if we read a byte from the kernel when someone is in the process of writing adjacent bytes (or writing the same byte, with changes only to bits in it which we don't care about), the kernel will deliver us something which is neither the old value nor the new value, but some monstrosity? Cheers, Jeff
Re: [HACKERS] DBT-3 with SF=20 got failed
Tomas Vondra writes: > Anyway, I think you're right we're going in circles here. I think we > both presented all the arguments we had and we still disagree. I'm not > going to continue with this - I'm unlikely to win an argument against > two committers if that didn't happen until now. Thanks for the > discussion though. Since we're hard up against the 9.5beta1 deadline, I've made an executive decision to commit just the minimal change, which I view as being to constrain the array size to MaxAllocSize where it has been all along. I found a second rather serious bug in the new hash code while doing that --- it failed to ensure nbuckets was actually a power of 2 --- which did not improve my opinion of it one bit. It's clear from this discussion that there's room for further improvement in the hashtable sizing behavior, but I think that's 9.6 material at this point. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check fails on Fedora 23
> Isn't this arguably a Fedora regression? What did they change in F23 to >> make it fail? I note that F23 is still in Beta. >> > It is working on F22 - so it is looking as regression in some fedora components. can somebody repeat check on FC23? Regards Pavel
Re: [HACKERS] Odd query execution behavior with extended protocol
Shay Rojansky writes: >> Try adding a sync before the second execute. > I tried inserting a Sync right before the second Execute, this caused an > error with the message 'portal "MQ1" does not exist'. > This seems like problematic behavior on its own, regardless of my issues > here (Sync shouldn't be causing an implicit close of the portal, should > it?). Sync results in closing the transaction, if you've not explicitly executed a BEGIN. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd query execution behavior with extended protocol
Shay Rojansky writes: >> To my mind there is not a lot of value in performing Bind until you >> are ready to do Execute. The only reason the operations are separated >> in the protocol is so that you can do multiple Executes with a row limit >> on each one, to retrieve a large query result in chunks. > So you would suggest changing my message chain to send Bind right after > Execute, right? This would yield the following messages: > P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current > P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S) > This would mean that I would switch to using named statements and the > unnamed portal, rather than the current unnamed statement > and named portals. If I recall correctly, I was under the impression that > there are some PostgreSQL performance benefits to using the > unnamed statement over named statements, although I admit I can't find any > documentation backing that. Can you confirm that the two > are equivalent performance-wise? Hmm. I do not recall exactly what performance optimizations apply to those two cases; they're probably not "equivalent", though I do not think the difference is major in either case. TBH I was a bit surprised on reading your message to hear that the system would take that sequence at all; it's not obvious that it should be allowed to replace a statement, named or not, while there's an open portal that depends on it. I think you might have more issues with lifespans, since portals go away at commit whereas named statements don't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Less than ideal error reporting in pg_stat_statements
Peter Geoghegan writes: > Attached, revised patch deals with the issues around removing the > query text file when garbage collection encounters trouble. There is > no reason to be optimistic about any error within gc_qtexts() not > recurring during a future garbage collection. OOM might be an > exception, but probably not, since gc_qtexts() is reached when a new > entry is created with a new query text, which in general makes OOM > progressively more likely. Hm. I'm unconvinced by the aspects of this that involve using mean_query_len as a filter on which texts will be accepted. While that's not necessarily bad in the abstract, there are way too many implementation artifacts here that will result in random-seeming decisions about whether to normalize. For instance: * mean_query_len only gets recomputed in entry_dealloc(), which is only run if we exceed pgss_max, and gc_qtexts(), which is only run if we decide the query texts file is more than 50% bloat. So there could be quite a long startup transient before the value gets off its initialization minimum, and I'm suspicious that there might be plausible use-cases where it never does. So it's not so much "restrict to a multiple of the mean query len" as "restrict to some number that might once upon a time have had some relation to the mean query len, or maybe not". * One could expect that after changing mean_query_len, the population of query texts would change character as a result of the filter behavior changing, so that convergence to stable behavior over the long haul is not exactly self-evident. * As you've got it here, entry_dealloc() and gc_qtexts() don't compute mean_query_len the same way, because only one of them discriminates against sticky entries. So the value would bounce about rather randomly based on which one had run last. * I'm not exactly convinced that sticky entries should be ignored for this purpose anyway. Taking a step back, ISTM the real issue you're fighting here is lots of orphaned sticky entries, but the patch doesn't do anything directly to fix that problem. I wonder if we should do something like making entry_dealloc() and/or gc_qtexts() aggressively remove sticky entries, or at least those with "large" texts. I think the aspects of this patch that are reasonably uncontroversial are increasing the allowed malloc attempt size in gc_qtexts, flushing the query text file on malloc failure, fixing the missing cleanup steps after a gc failure, and making entry_dealloc's recomputation of mean_query_len sane (which I'll define for the moment as "the same as gc_qtexts would get"). Since we're hard against a release deadline, I propose to commit just those changes, and we can consider the idea of a query size filter and/or redefining mean_query_len at leisure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check fails on Fedora 23
2015-10-04 17:52 GMT+02:00 Andrew Dunstan : > > > On 10/04/2015 11:35 AM, Pavel Stehule wrote: > >> >> >> >> > fails on assert >> >> Works for me ... what locale/collation are you running in? >> >> >> LANG=cs_CZ.UTF-8 >> >> >> it depends on locale - it is working with C or en_US.UTF-8, but >> doesn't work with Czech locale >> >> >> and fails with Hungarian locales too >> >> >> >> > > Isn't this arguably a Fedora regression? What did they change in F23 to > make it fail? I note that F23 is still in Beta. > Hard to say what can be wrong: * locale * gcc * glibc Regards Pavel > > cheers > > andrew >
Re: [HACKERS] check fails on Fedora 23
On 10/04/2015 11:35 AM, Pavel Stehule wrote: > fails on assert Works for me ... what locale/collation are you running in? LANG=cs_CZ.UTF-8 it depends on locale - it is working with C or en_US.UTF-8, but doesn't work with Czech locale and fails with Hungarian locales too Isn't this arguably a Fedora regression? What did they change in F23 to make it fail? I note that F23 is still in Beta. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd query execution behavior with extended protocol
> > I'm fairly sure that the query snapshot is established at Bind time, > which means that this SELECT will run with a snapshot that indeed > does not see the effects of the UPDATE. > > To my mind there is not a lot of value in performing Bind until you > are ready to do Execute. The only reason the operations are separated > in the protocol is so that you can do multiple Executes with a row limit > on each one, to retrieve a large query result in chunks. > So you would suggest changing my message chain to send Bind right after Execute, right? This would yield the following messages: P1/P2/D1/B1/E1/D2/B2/E2/S (rather than the current P1/D1/B1/P2/D2/B2/E1/C1/E2/C2/S) This would mean that I would switch to using named statements and the unnamed portal, rather than the current unnamed statement and named portals. If I recall correctly, I was under the impression that there are some PostgreSQL performance benefits to using the unnamed statement over named statements, although I admit I can't find any documentation backing that. Can you confirm that the two are equivalent performance-wise? Shay
Re: [HACKERS] Odd query execution behavior with extended protocol
> > Try adding a sync before the second execute. > I tried inserting a Sync right before the second Execute, this caused an error with the message 'portal "MQ1" does not exist'. This seems like problematic behavior on its own, regardless of my issues here (Sync shouldn't be causing an implicit close of the portal, should it?).
Re: [HACKERS] check fails on Fedora 23
>>> > fails on assert >>> >>> Works for me ... what locale/collation are you running in? >>> >> >> LANG=cs_CZ.UTF-8 >> > > it depends on locale - it is working with C or en_US.UTF-8, but doesn't > work with Czech locale > and fails with Hungarian locales too > > Pavel > > >> >> Regards >> >> Pavel >> >> >> >>> >>> regards, tom lane >>> >> >> >
Re: [HACKERS] check fails on Fedora 23
2015-10-04 17:07 GMT+02:00 Pavel Stehule : > > > 2015-10-04 16:37 GMT+02:00 Tom Lane : > >> Pavel Stehule writes: >> > I am testing PostgreSQL (master) on Fedora 23. The query >> >> > ELECT p1.oid, p1.proname, p2.oid, p2.proname >> > FROM pg_proc AS p1, pg_proc AS p2 >> > WHERE p1.oid < p2.oid AND >> > p1.prosrc = p2.prosrc AND >> > p1.prolang = 12 AND p2.prolang = 12 AND >> > (p1.proisagg = false OR p2.proisagg = false) AND >> > (p1.prolang != p2.prolang OR >> > p1.proisagg != p2.proisagg OR >> > p1.prosecdef != p2.prosecdef OR >> > p1.proleakproof != p2.proleakproof OR >> > p1.proisstrict != p2.proisstrict OR >> > p1.proretset != p2.proretset OR >> > p1.provolatile != p2.provolatile OR >> > p1.pronargs != p2.pronargs); >> >> > fails on assert >> >> Works for me ... what locale/collation are you running in? >> > > LANG=cs_CZ.UTF-8 > it depends on locale - it is working with C or en_US.UTF-8, but doesn't work with Czech locale Pavel > > Regards > > Pavel > > > >> >> regards, tom lane >> > >
Re: [HACKERS] check fails on Fedora 23
2015-10-04 16:37 GMT+02:00 Tom Lane : > Pavel Stehule writes: > > I am testing PostgreSQL (master) on Fedora 23. The query > > > ELECT p1.oid, p1.proname, p2.oid, p2.proname > > FROM pg_proc AS p1, pg_proc AS p2 > > WHERE p1.oid < p2.oid AND > > p1.prosrc = p2.prosrc AND > > p1.prolang = 12 AND p2.prolang = 12 AND > > (p1.proisagg = false OR p2.proisagg = false) AND > > (p1.prolang != p2.prolang OR > > p1.proisagg != p2.proisagg OR > > p1.prosecdef != p2.prosecdef OR > > p1.proleakproof != p2.proleakproof OR > > p1.proisstrict != p2.proisstrict OR > > p1.proretset != p2.proretset OR > > p1.provolatile != p2.provolatile OR > > p1.pronargs != p2.pronargs); > > > fails on assert > > Works for me ... what locale/collation are you running in? > LANG=cs_CZ.UTF-8 Regards Pavel > > regards, tom lane >
Re: [HACKERS] Odd query execution behavior with extended protocol
Shay Rojansky writes: > Npgsql supports sending multiple SQL statements in a single packet via the > extended protocol. This works fine, but when the second query SELECTs a > value modified by the first's UPDATE, I'm getting a result as if the UPDATE > hasn't yet occurred. > The exact messages send by Npgsql are: > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ0) > Parse (SELECT * FROM data WHERE id=1), statement=unnamed > Describe (statement=unnamed) > Bind (statement=unnamed, portal=MQ1) > Execute (portal=MQ0) > Close (portal=MQ0) > Execute (portal=MQ1) > Close (portal=MQ1) > Sync I'm fairly sure that the query snapshot is established at Bind time, which means that this SELECT will run with a snapshot that indeed does not see the effects of the UPDATE. To my mind there is not a lot of value in performing Bind until you are ready to do Execute. The only reason the operations are separated in the protocol is so that you can do multiple Executes with a row limit on each one, to retrieve a large query result in chunks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] check fails on Fedora 23
Pavel Stehule writes: > I am testing PostgreSQL (master) on Fedora 23. The query > ELECT p1.oid, p1.proname, p2.oid, p2.proname > FROM pg_proc AS p1, pg_proc AS p2 > WHERE p1.oid < p2.oid AND > p1.prosrc = p2.prosrc AND > p1.prolang = 12 AND p2.prolang = 12 AND > (p1.proisagg = false OR p2.proisagg = false) AND > (p1.prolang != p2.prolang OR > p1.proisagg != p2.proisagg OR > p1.prosecdef != p2.prosecdef OR > p1.proleakproof != p2.proleakproof OR > p1.proisstrict != p2.proisstrict OR > p1.proretset != p2.proretset OR > p1.provolatile != p2.provolatile OR > p1.pronargs != p2.pronargs); > fails on assert Works for me ... what locale/collation are you running in? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] row_security GUC, BYPASSRLS
* Noah Misch (n...@leadboat.com) wrote: > On Mon, Sep 28, 2015 at 05:13:56PM -0400, Stephen Frost wrote: > > * Noah Misch (n...@leadboat.com) wrote: > > > In schema reviews, I will raise a red flag for use of this feature; the > > > best > > > designs will instead use additional roles. I forecast that PostgreSQL > > > would > > > fare better with no owner-constrained-by-RLS capability. Even so, others > > > want > > > it, and FORCE ROW SECURITY would deliver it with an acceptable risk > > > profile. > > > > I've attached a patch to implement it. It's not fully polished but it's > > sufficient for comment, I believe. Additional comments, documentation > > and regression tests are to be added, if we have agreement on the > > grammer and implementation approach. > > This patch has FORCE ROW LEVEL SECURITY take precedence over row_security=off, > which thwarts pg_dump use of row_security=off to ensure dump completeness. Fixed. > Should this be a table-level flag, or should it be a policy-level flag? A > policy-level flag is more powerful. If nobody really anticipates using that > power, this table-level flag works for me. table-level seems the right level to me and no one is calling for policy-level. Further, policy-level could be added later if there ends up being significant interest later. > > > SECURITY_ROW_LEVEL_DISABLED could have been okay. I removed an incomplete > > > implementation (e.g. didn't affect CASCADE constraints). Writing a full > > > one > > > would be a mammoth job, and for what? Setting the wrong SELECT policies > > > can > > > disrupt _any_ application logic; no foreign keys or FORCE ROW SECURITY > > > need be > > > involved. Protecting just foreign keys brings some value, but it will not > > > materially reduce the vigilance demanded of RLS policy authors and > > > reviewers. > > > > I have a hard time with this. We're not talking about the application > > logic in this case, we're talking about the guarantees which the > > database is making to the user, be it an application or an individual. > > If disabling policies has an effect, table owners must be feeding conflicting > requirements into the system. Violating policies during referential integrity > queries is not, in general, less serious than violating referential integrity > itself. Rules and triggers pose the same threat, and we let those break > referential integrity. I think the ideal in all of these cases is rather to > detect the conflict and raise an error. SECURITY_ROW_LEVEL_DISABLED and > SECURITY_NOFORCE_RLS send policies in a third direction, neither the beaten > path of rules/triggers nor the ideal. The agreement between the user and the system with regard to permissions and referential integrity is that referential integrity takes priority over permissions. Prior to FORCE and SECURITY_NOFORCE_RLS that is true for RLS. I don't believe it makes sense that adding FORCE would change that agreement, nor do I agree that the ideal would be for the system to throw errors when the permissions system would deny access during RI checks. While I appreciate that rules and triggers can break RI, the way RLS works is consistent with the ACL system and FORCE should be consistent with the ACL system and normal/non-FORCE RLS with regard to referential integrity. > > I've included a patch (the second in the set attached) which adds a > > SECURITY_NOFORCE_RLS bit which has a much tighter scope- it only applies > > after the regular owner check is done. This reduces the risk of it > > being set mistakenly dramatically, I believe. > > Yes, that's far safer than SECURITY_ROW_LEVEL_DISABLED. I assume the final > design will let table owners completely bypass FORCE ROW LEVEL SECURITY under > "SET row_security = off". If so, SECURITY_NOFORCE_RLS poses negligible risk. I've made that change. > Functions differing only in s/ = true/ = false/? ATExecEnableDisableTrigger() > is a better model for this. I've changed that to be one function. As an independent patch, I'll do the same for ATExecEnable/DisableRowSecurity. Apologies about the timing, I had intended to get this done yesterday. Barring further concerns, I'll push the attached later today with the necessary catversion bump. Thanks! Stephen From 810c8f6ea717303a537b1c80337f98d3ad282645 Mon Sep 17 00:00:00 2001 From: Stephen Frost Date: Mon, 28 Sep 2015 11:28:15 -0400 Subject: [PATCH] ALTER TABLE .. FORCE ROW LEVEL SECURITY To allow users to force RLS to always be applied, even for table owners, add ALTER TABLE .. FORCE ROW LEVEL SECURITY. row_security=off overrides FORCE ROW LEVEL SECURITY, to ensure pg_dump output is complete (by default). Also add SECURITY_NOFORCE_RLS context to avoid data corruption when ALTER TABLE .. FORCE ROW SECURITY is being used. The SECURITY_NOFORCE_RLS security context is used only during referential integrity checks and is only considered in check_enable_rls() after we have already checked that the current u
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On 10/04/2015 04:29 PM, Andres Freund wrote: > On October 4, 2015 3:27:00 PM GMT+02:00, Amir Rohan > wrote: > >> Perhaps it would help a little if you posted the latest patch here as >> well? So that at least the app picks it up again. > > You can as additional threads in the cf app. > Done, thank you. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On October 4, 2015 3:27:00 PM GMT+02:00, Amir Rohan wrote: >Perhaps it would help a little if you posted the latest patch here as >well? So that at least the app picks it up again. You can as additional threads in the cf app. -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Rework access method interface
On Sat, Oct 3, 2015 at 5:07 PM, Petr Jelinek wrote: > On 2015-10-03 08:27, Amit Kapila wrote: > >> On Fri, Oct 2, 2015 at 8:14 PM, Alexander Korotkov >> mailto:a.korot...@postgrespro.ru>> wrote: >> > >> > >> > I agree about staying with one SQL-visible function. >> > > Okay, this does not necessarily mean there should be only one validation > function in the C struct though. I wonder if it would be more future proof > to name the C interface as something else than the current generic > amvalidate. Especially considering that it basically only does opclass > validation at the moment (It's IMHO saner in terms of API evolution to > expand the struct with more validator functions in the future compared to > adding arguments to the existing function). > > I also agree with you that adding more arguments in future might not be a good idea for exposed API. I don't know how much improvement we can get if we use structure and then keep on adding more members to it based on future need, but atleast that way it will be less prone to breakage. I think adding multiple validator functions is another option, but that also doesn't sound like a good way as it can pose difficulty in understanding the right version of API to be used. > >> Few assorted comments: >> >> 1. >> + * Get IndexAmRoutine structure from access method oid. >> + */ >> + IndexAmRoutine * >> + GetIndexAmRoutine(Oid >> amoid) >> ... >> + if (!RegProcedureIsValid >> (amhandler)) >> + elog(ERROR, "invalid %u regproc", amhandler); >> >> I have noticed that currently, the above kind of error is reported >> slightly >> differently as in below code: >> if (!RegProcedureIsValid(procOid)) \ >> elog(ERROR, "invalid %s regproc", CppAsString >> (pname)); \ >> >> If you feel it is better to do the way as it is in current code, then you >> can change accordingly. >> > > It's completely different use-case from existing code. And tbh I think it > should have completely different and more informative error message > something in the style of "index access method %s does not have a handler" > (see for example GetFdwRoutineByServerId or transformRangeTableSample how > this is handled for similar cases currently). > > makes sense to me, but in that case isn't it better to use ereport (as used in GetFdwRoutineByServerId()) rather than elog? > This however brings another comment - I think it would be better if the > GetIndexAmRoutine would be split into two interfaces. The GetIndexAmRoutine > itself would accept the amhandler Oid and should just do the > OidFunctionCall and then check the result is not NULL and possibly that it > is an IndexAmRoutine node. And then all the > > (IndexAmRoutine*)DatumGetPointer(!OidFunctionCall0(accessMethodForm->amhandler)); > calls in the code should be replaced with calls to the GetIndexAmRoutine > instead. > > The other routine (let's call it GetIndexAmRoutineByAmId for example) > would get IndexAmRoutine from amoid by looking up catalog, doing that > validation of amhandler Oid/regproc and calling the GetIndexAmRoutine. > > +1, I think that will make this API's design closer to what we have for corresponding FDW API. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On 10/02/2015 03:33 PM, Michael Paquier wrote: > > Michael, I'm afraid my email bungling has damaged your thread. I didn't include an "In-reply-To" header when I posted: trinity-b4a8035d-59af-4c42-a37e-258f0f28e44a-1443795007012@3capp-mailcom-lxa08. And we subsequently had our discussion over there instead of here, where the commitfest app is tracking it. https://commitfest.postgresql.org/6/197/ Perhaps it would help a little if you posted the latest patch here as well? So that at least the app picks it up again. Apologies for my ML n00bness, Amir -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd query execution behavior with extended protocol
On October 4, 2015 2:50:10 PM GMT+02:00, Shay Rojansky wrote: >> >> > Npgsql supports sending multiple SQL statements in a single packet >via >> the extended protocol. This works fine, but when the second query >SELECTs a >> value modified by the first's UPDATE, I'm getting a result as if the >> > UPDATE hasn't yet occurred. >> >> Looks like the first updating statement is not committed, assuming >that >> the two statements run in different transactions. >> > >I did try to prefix the message chain with an explicit transaction >BEGIN >(with the several different isolation levels) without a difference in >behavior. > >> The exact messages send by Npgsql are: >> > >> > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed >> > Describe (statement=unnamed) >> > Bind (statement=unnamed, portal=MQ0) >> > Parse (SELECT * FROM data WHERE id=1), statement=unnamed >> > Describe (statement=unnamed) >> > Bind (statement=unnamed, portal=MQ1) >> > Execute (portal=MQ0) >> > Close (portal=MQ0) >> > Execute (portal=MQ1) >> > Close (portal=MQ1) >> > Sync >> >> I never used Npgsql so I don't know if there is something missing >there. >> Would you need an explicit commit before closing MQ0? >> > >I guess this is exactly my question to PostgreSQL... But unless I'm >misunderstanding the transaction semantics I shouldn't need to commit >the >first UPDATE in order to see its effect in the second SELECT... Try adding a sync before the second execute. Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Odd query execution behavior with extended protocol
> > > Npgsql supports sending multiple SQL statements in a single packet via > the extended protocol. This works fine, but when the second query SELECTs a > value modified by the first's UPDATE, I'm getting a result as if the > > UPDATE hasn't yet occurred. > > Looks like the first updating statement is not committed, assuming that > the two statements run in different transactions. > I did try to prefix the message chain with an explicit transaction BEGIN (with the several different isolation levels) without a difference in behavior. > The exact messages send by Npgsql are: > > > > Parse (UPDATE data SET name='foo' WHERE id=1), statement=unnamed > > Describe (statement=unnamed) > > Bind (statement=unnamed, portal=MQ0) > > Parse (SELECT * FROM data WHERE id=1), statement=unnamed > > Describe (statement=unnamed) > > Bind (statement=unnamed, portal=MQ1) > > Execute (portal=MQ0) > > Close (portal=MQ0) > > Execute (portal=MQ1) > > Close (portal=MQ1) > > Sync > > I never used Npgsql so I don't know if there is something missing there. > Would you need an explicit commit before closing MQ0? > I guess this is exactly my question to PostgreSQL... But unless I'm misunderstanding the transaction semantics I shouldn't need to commit the first UPDATE in order to see its effect in the second SELECT... Also I am not in clear what "statement=unnamed" means, but it is used > twice. Is it possible that the update is overwritten with select before it > executes? > statement=unnamed means that the destination statement is the unnamed prepared statement (as described in http://www.postgresql.org/docs/current/static/protocol-message-formats.html). Right after the Parse I bind the unnamed statement which I just parsed to cursor MQ0. In other words, Npgsql first parses the two queries and binds them to portals MQ0 and MQ1, and only then executes both portals BTW: Do you see the change after update in your DB if you look into it with > another tool (e.g. psql)? > That's a good suggestion, I'll try to check it out, thanks!
[HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Sat, Oct 3, 2015 at 10:47 PM, Michael Paquier wrote: > On Sat, Oct 3, 2015 at 9:50 PM, Amir Rohan wrote: Block until recovery is finished, before testing. eliminate races, and avoid the stupid sleep(3) I used. >>> >>> TODO > > Well. I just recalled this item in the list of things you mentioned. I > marked it but forgot to address it. It sounds right that we may want > something using pg_isready in a loop as a node in recovery would > reject connections. I just hacked up an updated version with the following things: - Optional argument for stop_node to define the stop mode of pg_ctl - Addition of wait_for_node where pg_isready is used to wait until a node is ready to accept queries - Addition of a local lookup variable to track the last port assigned. This accelerates get_free_port. Regards, -- Michael diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm index a4c1737..ea219d7 100644 --- a/src/bin/pg_rewind/RewindTest.pm +++ b/src/bin/pg_rewind/RewindTest.pm @@ -125,38 +125,6 @@ sub check_query } } -# Run a query once a second, until it returns 't' (i.e. SQL boolean true). -sub poll_query_until -{ - my ($query, $connstr) = @_; - - my $max_attempts = 30; - my $attempts = 0; - my ($stdout, $stderr); - - while ($attempts < $max_attempts) - { - my $cmd = [ 'psql', '-At', '-c', "$query", '-d', "$connstr" ]; - my $result = run $cmd, '>', \$stdout, '2>', \$stderr; - - chomp($stdout); - $stdout =~ s/\r//g if $Config{osname} eq 'msys'; - if ($stdout eq "t") - { - return 1; - } - - # Wait a second before retrying. - sleep 1; - $attempts++; - } - - # The query result didn't change in 30 seconds. Give up. Print the stderr - # from the last attempt, hopefully that's useful for debugging. - diag $stderr; - return 0; -} - sub append_to_file { my ($filename, $str) = @_; diff --git a/src/test/Makefile b/src/test/Makefile index b713c2c..d6e51eb 100644 --- a/src/test/Makefile +++ b/src/test/Makefile @@ -17,7 +17,7 @@ SUBDIRS = regress isolation modules # We don't build or execute examples/, locale/, or thread/ by default, # but we do want "make clean" etc to recurse into them. Likewise for ssl/, # because the SSL test suite is not secure to run on a multi-user system. -ALWAYS_SUBDIRS = examples locale thread ssl +ALWAYS_SUBDIRS = examples locale thread ssl recovery # We want to recurse to all subdirs for all standard targets, except that # installcheck and install should not recurse into the subdirectory "modules". diff --git a/src/test/perl/RecoveryTest.pm b/src/test/perl/RecoveryTest.pm new file mode 100644 index 000..aa8998c --- /dev/null +++ b/src/test/perl/RecoveryTest.pm @@ -0,0 +1,401 @@ +package RecoveryTest; + +# Set of common routines for recovery regression tests for a PostgreSQL +# cluster. This includes global variables and methods that can be used +# by the various set of tests present to set up cluster nodes and +# configure them according to the test scenario wanted. +# +# Cluster nodes can be freely created using initdb or using the existing +# base backup of another node, with minimum configuration done when the +# node is created for the first time like having a proper port number. +# It is then up to the test to decide what to do with the newly-created +# node. +# +# Environment configuration of each node is available through a set +# of global variables provided by this package, hashed depending on the +# port number of a node: +# - connstr_nodes connection string to connect to this node +# - datadir_nodes to get the data folder of a given node +# - archive_nodes for the location of the WAL archives of a node +# - backup_nodes for the location of base backups of a node +# - applname_nodes, application_name to use for a standby +# +# Nodes are identified by their port number, which should be unique +# for each node of the cluster as it is run locally. + +use Cwd; +use TestLib; +use Test::More; + +use Archive::Tar; +use IPC::Run qw(run start); + +use Exporter 'import'; + +our @EXPORT = qw( + %connstr_nodes + %datadir_nodes + %backup_nodes + %archive_nodes + %applname_nodes + + append_to_file + backup_node + disable_node + dump_node_info + enable_archiving + enable_node + enable_restoring + enable_streaming + get_free_port + init_node + init_node_from_backup + make_master + make_warm_standby + make_hot_standby + restart_node + start_node + stop_node + teardown_node +); + +# Global variables for node data +%datadir_nodes = {}; # PGDATA folders +%backup_nodes = {}; # Backup base folder +%archive_nodes = {}; # Archive base folder +%connstr_nodes = {}; # Connection strings +%applname_nodes = {}; # application_name used for standbys + +# Tracking of last port value assigned to accelerate free port lookup. +# XXX: Should this part use PG_VERSION_NUM? +my $last_port_assigned = 90600 % 16384 + 49152; + +# Database used for each connection attempt via psql +$ENV{PGDATABASE} = "postgres"; + +# Tracker of active nodes +my @active_nodes = ()
Re: [HACKERS] check fails on Fedora 23
#15 0x00469376 in main (argc=8, argv=0x16a45e0) at main.c:223 >> >> Linux yen 4.2.1-300.fc23.x86_64+debug #1 SMP Mon Sep 21 21:58:30 UTC 2015 >> x86_64 x86_64 x86_64 GNU/Linux >> gcc (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4) >> >> Postgres 9.4.4 is working well >> > > configured with defaults - only --enable-cassert Regards Pavel
Re: [HACKERS] check fails on Fedora 23
2015-10-04 10:50 GMT+02:00 Pavel Stehule : > Hi > > I am testing PostgreSQL (master) on Fedora 23. The query > > ELECT p1.oid, p1.proname, p2.oid, p2.proname > FROM pg_proc AS p1, pg_proc AS p2 > WHERE p1.oid < p2.oid AND > p1.prosrc = p2.prosrc AND > p1.prolang = 12 AND p2.prolang = 12 AND > (p1.proisagg = false OR p2.proisagg = false) AND > (p1.prolang != p2.prolang OR > p1.proisagg != p2.proisagg OR > p1.prosecdef != p2.prosecdef OR > p1.proleakproof != p2.proleakproof OR > p1.proisstrict != p2.proisstrict OR > p1.proretset != p2.proretset OR > p1.provolatile != p2.provolatile OR > p1.pronargs != p2.pronargs); > > fails on assert > > Program terminated with signal SIGABRT, Aborted. > #0 0x7f3e1dfe5a98 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:55 > 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > (gdb) bt > #0 0x7f3e1dfe5a98 in __GI_raise (sig=sig@entry=6) at > ../sysdeps/unix/sysv/linux/raise.c:55 > #1 0x7f3e1dfe769a in __GI_abort () at abort.c:89 > #2 0x007c5401 in ExceptionalCondition > (conditionName=conditionName@entry=0x935157 "!(compareResult < 0)", > errorType=errorType@entry=0x802217 "FailedAssertion", > fileName=fileName@entry=0x935147 "nodeMergejoin.c", > lineNumber=lineNumber@entry=942) at assert.c:54 > #3 0x005eba9f in ExecMergeJoin (node=node@entry=0x175f120) at > nodeMergejoin.c:942 > #4 0x005d3958 in ExecProcNode (node=node@entry=0x175f120) at > execProcnode.c:480 > #5 0x005cfe87 in ExecutePlan (dest=0x177d1e0, > direction=, numberTuples=0, sendTuples=, > operation=CMD_SELECT, planstate=0x175f120, estate=0x175f008) at > execMain.c:1562 > #6 standard_ExecutorRun (queryDesc=0x16c7e88, direction=, > count=0) at execMain.c:342 > #7 0x006dd038 in PortalRunSelect (portal=portal@entry=0x16bed38, > forward=forward@entry=1 '\001', count=0, > count@entry=9223372036854775807, dest=dest@entry=0x177d1e0) at > pquery.c:942 > #8 0x006de57e in PortalRun (portal=portal@entry=0x16bed38, > count=count@entry=9223372036854775807, > isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x177d1e0, > altdest=altdest@entry=0x177d1e0, > completionTag=completionTag@entry=0x7ffe4f8236f0 "") at pquery.c:786 > #9 0x006db29b in exec_simple_query ( > query_string=0x1715318 "SELECT p1.oid, p1.proname, p2.oid, > p2.proname\nFROM pg_proc AS p1, pg_proc AS p2\nWHERE p1.oid < p2.oid > AND\np1.prosrc = p2.prosrc AND\np1.prolang = 12 AND p2.prolang = 12 > AND\n(p1.proisagg = f"...) at postgres.c:1105 > #10 PostgresMain (argc=, argv=argv@entry=0x16a57a0, > dbname=0x16a5500 "regression", username=) > at postgres.c:4033 > #11 0x0046810f in BackendRun (port=0x16c5f50) at postmaster.c:4204 > #12 BackendStartup (port=0x16c5f50) at postmaster.c:3880 > #13 ServerLoop () at postmaster.c:1683 > #14 0x0067e98b in PostmasterMain (argc=argc@entry=8, > argv=argv@entry=0x16a45e0) at postmaster.c:1292 > #15 0x00469376 in main (argc=8, argv=0x16a45e0) at main.c:223 > > Linux yen 4.2.1-300.fc23.x86_64+debug #1 SMP Mon Sep 21 21:58:30 UTC 2015 > x86_64 x86_64 x86_64 GNU/Linux > gcc (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4) > > Postgres 9.4.4 is working well > git bisect shows 4ea51cdfe85ceef8afabceb03c446574daa0ac23 is the first bad commit commit 4ea51cdfe85ceef8afabceb03c446574daa0ac23 Author: Robert Haas Date: Mon Jan 19 15:20:31 2015 -0500 Use abbreviated keys for faster sorting of text datums. This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the abbreviated comparison is insufficent to resolve the comparison, we fall back on the normal comparator. This can be much faster than the old way of doing sorting if the first few bytes of the string are usually sufficient to resolve the comparison. There is the potential for a performance regression if all of the strings to be sorted are identical for the first 8+ characters and differ only in later positions; therefore, the SortSupport machinery now provides an infrastructure to abort the use of abbreviation if it appears that abbreviation is producing comparatively few distinct keys. HyperLogLog, a streaming cardinality estimator, is included in this commit and used to make that determination for text. Peter Geoghegan, reviewed by me. > > Regards > > Pavel > > > > >
[HACKERS] check fails on Fedora 23
Hi I am testing PostgreSQL (master) on Fedora 23. The query ELECT p1.oid, p1.proname, p2.oid, p2.proname FROM pg_proc AS p1, pg_proc AS p2 WHERE p1.oid < p2.oid AND p1.prosrc = p2.prosrc AND p1.prolang = 12 AND p2.prolang = 12 AND (p1.proisagg = false OR p2.proisagg = false) AND (p1.prolang != p2.prolang OR p1.proisagg != p2.proisagg OR p1.prosecdef != p2.prosecdef OR p1.proleakproof != p2.proleakproof OR p1.proisstrict != p2.proisstrict OR p1.proretset != p2.proretset OR p1.provolatile != p2.provolatile OR p1.pronargs != p2.pronargs); fails on assert Program terminated with signal SIGABRT, Aborted. #0 0x7f3e1dfe5a98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x7f3e1dfe5a98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x7f3e1dfe769a in __GI_abort () at abort.c:89 #2 0x007c5401 in ExceptionalCondition (conditionName=conditionName@entry=0x935157 "!(compareResult < 0)", errorType=errorType@entry=0x802217 "FailedAssertion", fileName=fileName@entry=0x935147 "nodeMergejoin.c", lineNumber=lineNumber@entry=942) at assert.c:54 #3 0x005eba9f in ExecMergeJoin (node=node@entry=0x175f120) at nodeMergejoin.c:942 #4 0x005d3958 in ExecProcNode (node=node@entry=0x175f120) at execProcnode.c:480 #5 0x005cfe87 in ExecutePlan (dest=0x177d1e0, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, planstate=0x175f120, estate=0x175f008) at execMain.c:1562 #6 standard_ExecutorRun (queryDesc=0x16c7e88, direction=, count=0) at execMain.c:342 #7 0x006dd038 in PortalRunSelect (portal=portal@entry=0x16bed38, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x177d1e0) at pquery.c:942 #8 0x006de57e in PortalRun (portal=portal@entry=0x16bed38, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x177d1e0, altdest=altdest@entry=0x177d1e0, completionTag=completionTag@entry=0x7ffe4f8236f0 "") at pquery.c:786 #9 0x006db29b in exec_simple_query ( query_string=0x1715318 "SELECT p1.oid, p1.proname, p2.oid, p2.proname\nFROM pg_proc AS p1, pg_proc AS p2\nWHERE p1.oid < p2.oid AND\np1.prosrc = p2.prosrc AND\np1.prolang = 12 AND p2.prolang = 12 AND\n(p1.proisagg = f"...) at postgres.c:1105 #10 PostgresMain (argc=, argv=argv@entry=0x16a57a0, dbname=0x16a5500 "regression", username=) at postgres.c:4033 #11 0x0046810f in BackendRun (port=0x16c5f50) at postmaster.c:4204 #12 BackendStartup (port=0x16c5f50) at postmaster.c:3880 #13 ServerLoop () at postmaster.c:1683 #14 0x0067e98b in PostmasterMain (argc=argc@entry=8, argv=argv@entry=0x16a45e0) at postmaster.c:1292 #15 0x00469376 in main (argc=8, argv=0x16a45e0) at main.c:223 Linux yen 4.2.1-300.fc23.x86_64+debug #1 SMP Mon Sep 21 21:58:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux gcc (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4) Postgres 9.4.4 is working well Regards Pavel