Re: [HACKERS] WAL replay bugs
On Thu, Apr 17, 2014 at 10:33 AM, Tom Lane wrote: >> Any objections to changing those two? > > Not here. I've always suspected #2 was going to bite us someday anyway. +1 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure 9.3.4
On 04/17/2014 09:04 PM, Peter Geoghegan wrote: On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote: track_activity_query_size = 10240 shared_preload_libraries = 'auto_explain,pg_stat_statements' As you can see, auto_explain's log_min_duration hasn't been set, so it shouldn't be doing anything very much, I should think. track_activity_query_size is 10240? Not 1024? yes. I normally find 1024 is far too small. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
On Thursday, April 17, 2014, Joshua D. Drake wrote: > > On 04/17/2014 07:07 PM, David G Johnston wrote: > >> >> On 04/17/2014 05:24 PM, Tom Lane wrote: >> > On the whole I'm not sure this is something we ought to get into. >> > If you really need a fresh session, maybe you should start a >> > fresh session. >> >> >> Isn't the whole point to avoid the reconnection overhead, especially for >> connection poolers? DISCARD ALL shouldn't cause any cleanup that >> wouldn't otherwise occur when a session disconnects. True global data >> (not just session global) should be excluded. >> > > The GD is global to the session only (Like temp tables). Yes. Tom's response makes it sound like the proposal is to throw away the entire language environment for the whole server (thus needing super user privilege) so I'm pointing out that what we are discussing is not that invasive. > > >> A better wording of the promise would be: "discard all" leaves the >> session in the same state it would be in if the underlying connection >> were dropped and re-established. >> > > Except that it doesn't. > > But is this what you intend it to mean, by implementing these features, or are you thinking something different? David J.
Re: [HACKERS] DISCARD ALL (Again)
On 04/17/2014 07:07 PM, David G Johnston wrote: On 04/17/2014 05:24 PM, Tom Lane wrote: > On the whole I'm not sure this is something we ought to get into. > If you really need a fresh session, maybe you should start a > fresh session. Isn't the whole point to avoid the reconnection overhead, especially for connection poolers? DISCARD ALL shouldn't cause any cleanup that wouldn't otherwise occur when a session disconnects. True global data (not just session global) should be excluded. The GD is global to the session only (Like temp tables). A better wording of the promise would be: "discard all" leaves the session in the same state it would be in if the underlying connection were dropped and re-established. Except that it doesn't. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
* David G Johnston (david.g.johns...@gmail.com) wrote: > > On 04/17/2014 05:24 PM, Tom Lane wrote: > > > On the whole I'm not sure this is something we ought to get into. > > > If you really need a fresh session, maybe you should start a > > > fresh session. > > > Isn't the whole point to avoid the reconnection overhead, especially for > connection poolers? DISCARD ALL shouldn't cause any cleanup that wouldn't > otherwise occur when a session disconnects. True global data (not just > session global) should be excluded. Err, this is per-backend global data, so it will get cleaned up on session disconnect. > A better wording of the promise would be: "discard all" leaves the session > in the same state it would be in if the underlying connection were dropped > and re-established. This is not accurate. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] DISCARD ALL (Again)
> On 04/17/2014 05:24 PM, Tom Lane wrote: > > On the whole I'm not sure this is something we ought to get into. > > If you really need a fresh session, maybe you should start a > > fresh session. > > Isn't the whole point to avoid the reconnection overhead, especially for connection poolers? DISCARD ALL shouldn't cause any cleanup that wouldn't otherwise occur when a session disconnects. True global data (not just session global) should be excluded. A better wording of the promise would be: "discard all" leaves the session in the same state it would be in if the underlying connection were dropped and re-established. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800662.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 6:50 PM, Greg Stark wrote: > On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan wrote: >> Although I will add that not caching highly useful inner pages for the >> medium term, because that index isn't being used at all for 5 minutes >> probably is very bad. Using the 4,828 buffers that it would take to >> store all the inner pages (as in my large primary index example) to go >> store something else is probably penny wise and pound foolish. > > But there could easily be 20 unused indexes for every 1 index that is > being used. Sure, but then there might not be. Obviously there is a trade-off to be made between recency and frequency. One interesting observation in the LRU-K paper is that for their test case, a pure LFU actually works very well, despite, as the authors acknowledge, being a terrible algorithm in the real world. That's because their test case is so simple, and concerns only one table/index, with a uniform distribution. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 4:48 PM, Peter Geoghegan wrote: > Although I will add that not caching highly useful inner pages for the > medium term, because that index isn't being used at all for 5 minutes > probably is very bad. Using the 4,828 buffers that it would take to > store all the inner pages (as in my large primary index example) to go > store something else is probably penny wise and pound foolish. But there could easily be 20 unused indexes for every 1 index that is being used. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [bug fix] pg_ctl always uses the same event source
On Sat, Apr 12, 2014 at 1:21 PM, MauMau wrote: > Hello, Amit san, Tom san, > > I'm sorry for my late response. I've just caught up with the discussion. > I'm almost convinced. > > Please find attached the revised patch. I'd like to follow the idea of > adding a switch to pg_ctl. The newly added ""-e event_source" sets the > event source name for pg_ctl to use. When -e is used with pg_ctl register, > it will be added to the command line for Windows service (pg_ctl > runservice). Currently -e option is accepted with all the options that can be provided in pg_ctl. Shouldn't we accept it only with options related to service, because that is only when it will be used. Basically write_stderr() will write to event log only incase of service. Another minor point is you have forgotten to remove below declaration: + static void get_config_value(const char *name, char *buf, int buf_size); Sorry for delayed response and I am not sure that I will be able to complete the review of patch in next few days as I will be on vacation. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Thu, Apr 17, 2014 at 5:26 PM, Jeff Janes wrote: > A lot of the bugs that turned up are not the kind I would expect to have > been found in most beta testing done by non-hacking users. Weren't they > mostly around rare race conditions, crash recovery, and freezing? Actually I was struck by how the bugs in 9.3 were the kinds of bugs that should have turned up pretty quickly by user testing. They certainly turned up pretty quickly after users put their production applications on it. They *didn't* require rare race conditions, just certain patterns of workloads for long enough to reliably reproduce. They were specifically *not* the kinds of bugs that regression testing would have found. Regression testing only finds bugs you anticipate and think to put in the specification of correct behaviour. If you had thought of these problems you would have tested them manually and in any case you would have seen the omissions immediately on inspected the code. Crash recovery and freezing aren't rare things once you have hot standbys everywhere and run 24x7 applications (or load tests) on your systems. We could make freezing more frequent by having a mode that bumps the xid by a few million randomly. That would still be pretty hit and miss whether it happens to wrap around in any particular state. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm "master-next" branch?
On 04/17/2014 10:38 PM, Tom Lane wrote: > IMO the best single thing that could happen for the buildfarm is if > we had more critters (at least twice as many) running a wider variety of > platforms, compilers, and configuration options than there are today. > More frequent runs would come out of that automatically. I'll be bringing up a new Windows buildfarm member once I've got a current project knocked off. It's a pretty fast dedicated Windows Server 2012 box with a wide range of SDKs on it that can do 32-bit and 64-bit builds. Should help a little. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure 9.3.4
On Thu, Apr 17, 2014 at 7:15 AM, Andrew Dunstan wrote: > track_activity_query_size = 10240 >shared_preload_libraries = 'auto_explain,pg_stat_statements' > > As you can see, auto_explain's log_min_duration hasn't been set, so it > shouldn't be doing anything very much, I should think. track_activity_query_size is 10240? Not 1024? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
"Joshua D. Drake" writes: > On 04/17/2014 05:24 PM, Tom Lane wrote: >> 1. The things that DISCARD ALL resets are explicitly enumerated in its >> documentation page; it is not an open-ended promise to clean up anything >> anybody happens to think of. > Actually, it is. Unless we change the wording in the documentation Well, we definitely need to tweak the short description of DISCARD ALL. But the long description a bit further down the page is quite clear. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
On 04/17/2014 05:24 PM, Tom Lane wrote: "Joshua D. Drake" writes: Does it seem reasonable based on the docs: DISCARD ALL: Releases all temporary resources associated with the current session and resets the session to its initial state. That we should also release the GD? There are a couple of reasons why this isn't anywhere near as open-and-shut as you seem to think: 1. The things that DISCARD ALL resets are explicitly enumerated in its documentation page; it is not an open-ended promise to clean up anything anybody happens to think of. Actually, it is. Unless we change the wording in the documentation because right now, DISCARD ALL states that it will return the session back to its initial state. It does not do that. [... snipping technical details for those who would be better at arguing than I ... ] On the whole I'm not sure this is something we ought to get into. If you really need a fresh session, maybe you should start a fresh session. This isn't nearly as simple as it sounds. That said, the work around we came up with was to have server_reset_query in pg_bouncer call a function called discard_gd() that explicitly discards the GD for that session. But if we are going to get into it, I think the permissions aspect needs some careful thought. Agreed. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
"Joshua D. Drake" writes: > Does it seem reasonable based on the docs: > DISCARD ALL: > Releases all temporary resources associated with the current session and > resets the session to its initial state. > That we should also release the GD? There are a couple of reasons why this isn't anywhere near as open-and-shut as you seem to think: 1. The things that DISCARD ALL resets are explicitly enumerated in its documentation page; it is not an open-ended promise to clean up anything anybody happens to think of. In particular, I would object to adding any such functionality that doesn't correspond to a more specific form of DISCARD, RESET, or whatever. 2. While I'm no Python expert, I believe GD is just a specific instance of a general capability for global state in Python. Are we going to promise that any and all user-created data inside Python goes away? What about other PLs? Will users thank us if this suddenly starts happening? 3. Since PL/Python is a superuser-only language, destroying either all global state or just GD should require superuser privileges. It will definitely break things to make DISCARD ALL require superuser privs. It's not very much more attractive to say that it only destroys Python state if issued by a superuser. We could in fact implement #2, I imagine, by destroying and recreating the entire language interpreter. So I could imagine implementing a "DISCARD INTERPRETERS" kind of command that would zap the current interpreter(s) for whichever PL languages happened to feel like cooperating with the command. But I'm not at all sure about the superuser-privileges aspect of that. More: IIRC, in at least some of the PLs, the interpreters are SQL-user-specific. Should a non-superuser session user have the ability to destroy state in interpreters running for other user IDs (presumably, state created by SECURITY DEFINER functions)? I'd guess not. On the whole I'm not sure this is something we ought to get into. If you really need a fresh session, maybe you should start a fresh session. But if we are going to get into it, I think the permissions aspect needs some careful thought. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Fri, Apr 18, 2014 at 4:29 AM, Fabrízio de Royes Mello wrote: > > On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian wrote: >> >> On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote: >> > Bruce Momjian writes: >> > > The idea is that we only need quotes when there are odd characters in >> > > the identifier. We do that right now in some places, though I can't >> > > find them in pg_dump. I know psql does that, see quote_ident(). >> > >> > I think our general style rule is that identifiers embedded in messages >> > are always double-quoted. There's an exception for type names, but >> > not otherwise. You're confusing the message case with printing SQL. >> >> OK. I was unclear if a status _display_ was a message like an error >> message. >> > > The attached patch fix missing double-quoted in "dumping contents of > table.." message and add schema name to other messages: > - "reading indexes for table \"%s\".\"%s\"\n" > - "reading foreign key constraints for table \"%s\".\"%s\"\n" > - "reading triggers for table \"%s\".\"%s\"\n" > > - "finding the columns and types of table \"%s\".\"%s\"\n" > - "finding default expressions of table \"%s\".\"%s\"\n" > - "finding check constraints for table \"%s\".\"%s\"\n" Cool additions. There may be a more elegant way to check if namespace is NULL, but I couldn't come up with one myself. So patch may be fine. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure 9.3.4
All, So have encountered a 2nd report of this issue, or of an issue which sounds very similar: - corruption in two "queue" tables - the tables are written in a high-concurrency, lock-contested environment - user uses SELECT FOR UPDATE with these tables. - pg_stat_statements .so is loaded, but the extension is not installed - four rows were added to the queue tables, while not being added to the PK index. This allowed duplicate PKs to be added. Currently the user is testing not loading the pg_stat_statements.so to see if the problem goes away. They have a destruction test environment, so we should be able to confirm/deny in a couple days. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On 04/15/2014 09:53 PM, Rod Taylor wrote: > A documented beta test process/toolset which does the following would help: > 1) Enables full query logging > 2) Creates a replica of a production DB, record $TIME when it stops. > 3) Allow user to make changes (upgrade to 9.4, change hardware, change > kernel settings, ...) > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > timing and transaction boundaries > > If Pg can make it easy to duplicate activities currently going on in > production inside another environment, I would be pleased to fire a couple > billion queries through it over the next few weeks. > > #4 should include reporting useful to the project, such as a sampling of > queries which performed significantly worse and a few relative performance > stats for overall execution time. So we have some software we've been procrastinating on OSS'ing, which does: 1) Takes full query CSV logs from a running postgres instance 2) Runs them against a target instance in parallel 3) Records response times for all queries tsung and pgreplay also do this, but have some limitations which make them impractical for a general set of logs to replay. What it would need is: A) scripting around coordinated backups B) Scripting for single-command runs, including changing pg.conf to record data. C) tools to *analyze* the output data, including error messages. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
On Thu, Apr 17, 2014 at 6:51 PM, Alvaro Herrera wrote: > > It does sounds a legitimate feature request to me. I don't remember if > we honored the request to add resetting of cached sequences, though; if > we didn't, this one is probably going to be tough too. > +1 > Another point is that to implement this I think there will need to be > another per-PL entry point to discard session data; are we okay with > that? Since this probably means a new column in pg_language, we > couldn't even consider the idea of back-patching. Unless we add a hook, > which is registered in the PL's _PG_init()? > This week I had some similar trouble, but using "dblink" and "pgbouncer". As expected "DISCARD ALL" don't clear the extension resources. I was thinking if is possible to every extension register his own "discard" procedure and then the "DISCARD ALL" can execute all registered extension cleanup procedures. Makes sense? > Are we going to backpatch a doc change that says "releases all temporary > resources, except for plptyhon's and plperl's GD"? Surely not ... > Maybe this doc can be like that: "releases all temporary resources, except for extensions" Grettings, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] DISCARD ALL (Again)
Alvaro Herrera-9 wrote > Are we going to backpatch a doc change that says "releases all temporary > resources, except for plptyhon's and plperl's GD"? Surely not ... GD = Global Dictionary I don't see why something like the following wouldn't have value. "For those languages that make use of a Global Dictionary (or other global) construct to maintain state across function calls the content of these global constructs may not be reset upon a call to DISCARD ALL - please see your language's documentation for specifics.". Then, more attention should be made within the language sections both to the constructs in general and also their limitations in the face of DISCARD ALL. "Note, the session-level GD construct is not informed of the execution of DISCARD ALL and thus is not cleared when such is executed." It would be helpful to note any work-around options as well. It sounds like, generally, PostgreSQL should promise to notify each language/session of a pending DISCARD ALL and then leave it up to the language to define how it will respond. The documentation should reflect this promise/action division-of-responsibilities. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/DISCARD-ALL-Again-tp5800623p5800640.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
Joshua D. Drake wrote: > > On 04/17/2014 02:17 PM, Josh Berkus wrote: > > > >On 04/17/2014 01:44 PM, Joshua D. Drake wrote: > >>Does it seem reasonable based on the docs: > >> > >>DISCARD ALL: > >> > >>Releases all temporary resources associated with the current session and > >>resets the session to its initial state. > >> > >>That we should also release the GD? > > > >It does, but that's a feature request, not a bug. Same with pl/perl. > > > > Well I would argue it is a documentation bug/error and that yes, we > have a legitimate feature request for DISCARD ALL to clear the GD. It does sounds a legitimate feature request to me. I don't remember if we honored the request to add resetting of cached sequences, though; if we didn't, this one is probably going to be tough too. Another point is that to implement this I think there will need to be another per-PL entry point to discard session data; are we okay with that? Since this probably means a new column in pg_language, we couldn't even consider the idea of back-patching. Unless we add a hook, which is registered in the PL's _PG_init()? Are we going to backpatch a doc change that says "releases all temporary resources, except for plptyhon's and plperl's GD"? Surely not ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
On 04/17/2014 02:17 PM, Josh Berkus wrote: On 04/17/2014 01:44 PM, Joshua D. Drake wrote: Does it seem reasonable based on the docs: DISCARD ALL: Releases all temporary resources associated with the current session and resets the session to its initial state. That we should also release the GD? It does, but that's a feature request, not a bug. Same with pl/perl. Well I would argue it is a documentation bug/error and that yes, we have a legitimate feature request for DISCARD ALL to clear the GD. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On 04/17/14 15:16, Merlin Moncure wrote: On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus wrote: Hackers, I think 9.3 has given us evidence that our users aren't giving new versions of PostgreSQL substantial beta testing, or if they are, they aren't sharing the results with us. How can we make beta testing better and more effective? How can we get more users to actually throw serious workloads at new versions and share the results? I've tried a couple of things over the last two years and they haven't worked all that well. Since we're about to go into another beta testing period, we need something new. Ideas? I've seen lots of bugs reported and fixed in the beta period over the years. My take is that it's basically unrealistic to expect volunteer beta testers to replace bone fide regression testing. I think it's a pretty fair statement that we've had some QC issues in the general area of replication technologies. What this is indicating to me is that replication needs substantially more coverage in 'make check'. Since I'm wishing for things, it would be nice to see an expansion of the buildfarm so that we could [optionally] run various performance tests as well as various replication scenarios. Then we could go back to users and say, please donate 'repeatable tests and machines to run them on' and reap the long term value. Not at all making light out of any of this...it's a huge project. The problem with testing replication is that it doesn't fit well into our standard regression testing. There are way too many moving parts as well as dependencies on the underlying OS and network topology. You will discover a ton of race conditions once you actually move from testing with multiple postmasters (so you can kill one) on the same box to using multiple virtual machines and instead of completely severing a network connection using some packet shaping/filtering to introduce packet loss, limited bandwidth, async routing and so on. At least that is my experience from throwing that sort of sh*t at Slony at full speed. Not trying to discourage anyone from trying. Just saying that it doesn't fit into our existing regression test framework. Jan -- Jan Wieck Senior Software Engineer http://slony.info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Tue, Apr 15, 2014 at 2:47 PM, Josh Berkus wrote: > Hackers, > > I think 9.3 has given us evidence that our users aren't giving new > versions of PostgreSQL substantial beta testing, or if they are, they > aren't sharing the results with us. > A lot of the bugs that turned up are not the kind I would expect to have been found in most beta testing done by non-hacking users. Weren't they mostly around rare race conditions, crash recovery, and freezing? > > How can we make beta testing better and more effective? How can we get > more users to actually throw serious workloads at new versions and share > the results? > If we are interested in positive results as well as negative, we should change https://wiki.postgresql.org/wiki/HowToBetaTest "pgsql-hackers: bugs, questions, and successful test reports are welcome here if you are already subscribed to pgsql-hackers. Note that pgsql-hackers is a high-traffic mailing list with a lot of development discussion." So successful reports are welcome, provided that you are willing to subscribe to a list that generates tons of noise you won't understand. That doesn't sound all that welcoming. (I already am subscribed, but I still usually don't report successful tests, because "yeah, I did a bunch of stuff, and nothing failed in an obvious way" just doesn't sound very useful, and it is hard to get motivated to write up an exhaustive description of a test that doesn't prove anything anyway--maybe if I did for a few more hours, it would have found a problem.) If we want to know how much beta testing is really going on, perhaps we could do a survey asking people whether they did any beta testing, and if so whether they reported the results. Otherwise it would be hard to distinguish "We aren't doing enough testing" from "We do lots of testing, but it isn't strenuous enough to find the problems, or is testing the wrong aspects of the system". Cheers, Jeff
Re: [HACKERS] Patch: iff -> if
On 04/17/2014 12:33 PM, Nicolas Barbier wrote: 2014-04-17 Michael Paquier : Is there no equivalent in German? For example in French there is "ssi". gdw (genau dann, wenn) More likely that you see \equiv or: \leftrightarrow Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DISCARD ALL (Again)
On 04/17/2014 01:44 PM, Joshua D. Drake wrote: > Does it seem reasonable based on the docs: > > DISCARD ALL: > > Releases all temporary resources associated with the current session and > resets the session to its initial state. > > That we should also release the GD? It does, but that's a feature request, not a bug. Same with pl/perl. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion in 9.4 with wal_level=logical
On 04/17/2014 04:33 PM, Andres Freund wrote: Hi, On 2014-04-17 16:23:54 -0400, Steve Singer wrote: With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4) I am getting an assertion when doing a truncate via SPI when I have wal_level=logical. Stack trace is below. I am just replicating a table with normal slony (2.2) I don't need to establish any replication slots to get this. Uh, that's somewhat nasty... You probably only get that because of slony's habit of share locking catalogs. Could that be? Yes slony does a select from pg_catalog and pg_namespace in the stored function just before doing the truncate. For now, to circumvent the problem you could just revert 4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now. I'll look into fixing it properly. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 1:33 PM, Peter Geoghegan wrote: > I can't imagine that this is much of a problem in practice. Although I will add that not caching highly useful inner pages for the medium term, because that index isn't being used at all for 5 minutes probably is very bad. Using the 4,828 buffers that it would take to store all the inner pages (as in my large primary index example) to go store something else is probably penny wise and pound foolish. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion in 9.4 with wal_level=logical
On 2014-04-17 17:40:01 -0300, Alvaro Herrera wrote: > For once, this looks more like a problem in logical decoding, which is > trying to assert about the tuple being updated; the assertion failing is > the one added a week ago about not palloc'ing in a critical section. It's this (older) assertion in HeapTupleHeaderGetCmax(): Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup))); That can allocate memory if xmax is a multixact... Does anybody have a better idea to solve this than adding a CritSectionCount == 0 && in there? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 1:39 PM, Andres Freund wrote: > On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote: >> Just over 99.6% of pages (leaving aside the meta page) in the big 10 >> GB pgbench_accounts_pkey index are leaf pages. > > That's a rather nice number. I knew it was big, but I'd have guessed > it'd be a percent lower. Yes, it's usually past 99.5% for int4. It's really bad if it's as low as 96%, and I think that often points to what are arguably bad indexing choices, like indexing text columns that have long text strings. > Do you happen to have the same stat handy for a sensibly wide text or > numeric real world index? It'd be interesting to see what the worst case > there is. Yes, as it happens I do: http://www.postgresql.org/message-id/CAM3SWZTcXrdDZSpA11qZXiyo4_jtxwjaNdZpnY54yjzq7d64=a...@mail.gmail.com I was working of my Mouse Genome database, which is actually real-world data use by medical researchers, stored in a PostgreSQL database by those researchers and made available for the benefit of other medical researchers. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DISCARD ALL (Again)
Hello, Over at my quaint establishment we have been working on some plython work that makes use of GD. We wrote this code with the assumption (per the docs) that when you issued a DISCARD ALL, the GD would be cleared. Apparently this is not the case. The docs themselves are clearly wrong, and this has been brought up before multiple times (that DISCARD ALL doesn't do what people think it does). Does it seem reasonable based on the docs: DISCARD ALL: Releases all temporary resources associated with the current session and resets the session to its initial state. That we should also release the GD? Sincerely, Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc Political Correctness is for cowards. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion in 9.4 with wal_level=logical
Steve Singer wrote: > With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4) > > I am getting an assertion when doing a truncate via SPI when I have > wal_level=logical. > > Stack trace is below. > > I am just replicating a table with normal slony (2.2) I don't need > to establish any replication slots to get this. For once, this looks more like a problem in logical decoding, which is trying to assert about the tuple being updated; the assertion failing is the one added a week ago about not palloc'ing in a critical section. Andres told me on IM that there's another weird thing going on (which is how the catalog tuple gets a multixact in the first place) which is that Slony does a SELECT FOR SHARE in the catalog tuple. One simple approach would be to just disable that particular assert when in a critical section. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-17 13:33:27 -0700, Peter Geoghegan wrote: > Just over 99.6% of pages (leaving aside the meta page) in the big 10 > GB pgbench_accounts_pkey index are leaf pages. That's a rather nice number. I knew it was big, but I'd have guessed it'd be a percent lower. Do you happen to have the same stat handy for a sensibly wide text or numeric real world index? It'd be interesting to see what the worst case there is. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion in 9.4 with wal_level=logical
Hi, On 2014-04-17 16:23:54 -0400, Steve Singer wrote: > With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4) > > I am getting an assertion when doing a truncate via SPI when I have > wal_level=logical. > > Stack trace is below. > > I am just replicating a table with normal slony (2.2) I don't need to > establish any replication slots to get this. Uh, that's somewhat nasty... You probably only get that because of slony's habit of share locking catalogs. Could that be? For now, to circumvent the problem you could just revert 4a170ee9e0ebd7021cb1190fabd5b0cbe2effb8e for now. I'll look into fixing it properly. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 8:10 AM, Greg Stark wrote: > I don't think "common sense" is compelling. I think you need to pin > down exactly what it is about btree intermediate pages that the LRU > isn't capturing and not just argue they're more useful. The LRU is > already capturing which pages are more heavily used than others so you > need to identify what it is that makes index pages *even more* useful > than their frequency and recency of access indicates. Not just that > they're more useful than an average page. See example 1.1 within the LRU-K paper. > So what I think is missing is that indexes are always accessed from > the root down to the leaf. So the most recent page accessed will > always be the leaf. And in whatever chain of pages was used to reach > the last leaf page the least recently accessed will always be the > root. But we'll need the root page again on the subsequent descent > even if it's to reach the same leaf page we kept in ram in preference > to it. I can't imagine that this is much of a problem in practice. Consider the break-down of pages within indexes when pgbench scale is 5,000, as in my original benchmark: [local] pg@pgbench=# with tots as ( SELECT count(*) c, type, relname from (select relname, relpages, generate_series(1, relpages - 1) i from pg_class c join pg_namespace n on c.relnamespace = n.oid where relkind = 'i' and nspname = 'public') r, lateral (select * from bt_page_stats(relname, i)) u group by relname, type) select tots.relname, relpages -1 as non_meta_pages, c, c/sum(c) over(partition by tots.relname) as prop_of_index, type from tots join pg_class c on c.relname = tots.relname order by 2 desc, 1, type; relname| non_meta_pages |c| prop_of_index| type ---++-++-- pgbench_accounts_pkey |1370950 |4828 | 0.00352164557423684307 | i pgbench_accounts_pkey |1370950 | 1366121 | 0.99647762500455888253 | l pgbench_accounts_pkey |1370950 | 1 | 0.00729421204274408257 | r pgbench_tellers_pkey |274 | 273 | 0.99635036496350364964 | l pgbench_tellers_pkey |274 | 1 | 0.00364963503649635036 | r pgbench_branches_pkey | 28 | 27 | 0.96428571428571428571 | l pgbench_branches_pkey | 28 | 1 | 0.03571428571428571429 | r (7 rows) Time: 14562.297 ms Just over 99.6% of pages (leaving aside the meta page) in the big 10 GB pgbench_accounts_pkey index are leaf pages. The inner pages and root page are at an enormous advantage. In this example, the other indexes don't even have what would be separately classified as an inner page (and not a root page) at all, because it's perfectly sufficient to only have a root page to get to any one of, say, 273 leaf pages (in the case of pgbench_tellers_pkey here). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] assertion in 9.4 with wal_level=logical
With master/9.4 from today (52e757420fa98a76015c2c88432db94269f3e8f4) I am getting an assertion when doing a truncate via SPI when I have wal_level=logical. Stack trace is below. I am just replicating a table with normal slony (2.2) I don't need to establish any replication slots to get this. (gdb) where #0 0x7fc9b4f58295 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x7fc9b4f5b438 in __GI_abort () at abort.c:90 #2 0x007a10f7 in ExceptionalCondition ( conditionName=conditionName@entry=0x955d90 "!(CritSectionCount == 0 || (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", errorType=errorType@entry=0x7da7b0 "FailedAssertion", fileName=fileName@entry=0x955a2e "mcxt.c", lineNumber=lineNumber@entry=670) at assert.c:54 #3 0x007c3090 in palloc (size=16) at mcxt.c:670 #4 0x004dd83f in mXactCacheGetById (members=0x7fff679a3d18, multi=58) at multixact.c:1411 #5 GetMultiXactIdMembers (multi=58, members=members@entry=0x7fff679a3d18, allow_old=allow_old@entry=0 '\000') at multixact.c:1080 #6 0x0049e43f in MultiXactIdGetUpdateXid (xmax=, t_infomask=) at heapam.c:6042 #7 0x004a1ccc in HeapTupleGetUpdateXid (tuple=) at heapam.c:6083 #8 0x007cf7d9 in HeapTupleHeaderGetCmax (tup=tup@entry=0x7fc9ac838e38) at combocid.c:122 #9 0x0049eb98 in log_heap_new_cid ( relation=relation@entry=0x7fc9b5a67dc0, tup=tup@entry=0x7fff679a3ea0) at heapam.c:7047 #10 0x004a48a5 in heap_update (relation=relation@entry=0x7fc9b5a67dc0, otid=otid@entry=0x2678c6c, newtup=newtup@entry=0x2678c68, cid=26, crosscheck=crosscheck@entry=0x0, wait=wait@entry=1 '\001', hufd=hufd@entry=0x7fff679a4080, lockmode=lockmode@entry=0x7fff679a407c) at heapam.c:3734 #11 0x004a5842 in simple_heap_update ( relation=relation@entry=0x7fc9b5a67dc0, otid=otid@entry=0x2678c6c, tup=tup@entry=0x2678c68) at heapam.c:4010 #12 0x00797cf7 in RelationSetNewRelfilenode ( relation=relation@entry=0x7fc9ab270b68, freezeXid=19459, minmulti=minmulti@entry=58) at relcache.c:2956 #13 0x0059ddde in ExecuteTruncate (stmt=0x3a, stmt@entry=0x2678a58) at tablecmds.c:1187 #14 0x006c3870 in standard_ProcessUtility (parsetree=0x2678a58, queryString=, context=, params=0x0, dest=, completionTag=) at utility.c:515 #15 0x005e79d9 in _SPI_execute_plan (plan=plan@entry=0x7fff679a4320, paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0, crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=0 '\000', fire_triggers=fire_triggers@entry=1 '\001', tcount=tcount@entry=0) at spi.c:2171 #16 0x005e83c1 in SPI_execute ( #16 0x005e83c1 in SPI_execute ( ---Type to continue, or q to quit--- src=src@entry=0x25bde90 "truncate only \"disorder\".\"do_restock\"", read_only=0 '\000', tcount=tcount@entry=0) at spi.c:386 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New windows compiler warning from 585bca39
David Rowley writes: > I've attached a tiny patch that fixes a new compiler warning on the windows > build... Applied, thanks. > Perhaps the #ifndef could be placed in a nicer spot in the patch, but the > attached should at least describe where the problem lies... Yeah, I thought it better to make a separate declaration to wrap in #ifndef. pgindent is probably going to insist on adding some vertical whitespace around the #if, and that'll look horrid if it's just in the middle of a list of variables. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thursday, April 17, 2014, Merlin Moncure wrote: > > no -- I got you. My point was, that's a pure guess unless you base it > on evidence recorded on the page itself. Without that evidence, > (which requires writing) the operating is in a a better place to make > that guess so it's probably better to defer that decision. > Well, we'd only need that info to be stored in the buffer cache somehow- wouldn't have to go to disk or cause more I/O, of course. My thinking was that we could track it with the existing counter too, avoiding even that small amount of locking to write to the buffer page. Thanks, Stephen
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 2:28 PM, Stephen Frost wrote: > > > On Thursday, April 17, 2014, Merlin Moncure wrote: >> >> yeah -- the thing is, we are already too spendy already on >> supplemental write i/o (hint bits, visible bits, freezing, etc) and >> likely not worth it to throw something else on the pile unless the >> page is already dirty; the medium term trend in storage is that read >> vs write performance is becoming increasingly asymmetric, particularly >> on the random side so it's very unlikely to balance out. > > Guess I wasn't clear but I was thinking to read the page in, not do any > writing, and do it in a asynchronous way to the process doing the evicting. no -- I got you. My point was, that's a pure guess unless you base it on evidence recorded on the page itself. Without that evidence, (which requires writing) the operating is in a a better place to make that guess so it's probably better to defer that decision. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 12:46 PM, Bruce Momjian wrote: > > On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > The idea is that we only need quotes when there are odd characters in > > > the identifier. We do that right now in some places, though I can't > > > find them in pg_dump. I know psql does that, see quote_ident(). > > > > I think our general style rule is that identifiers embedded in messages > > are always double-quoted. There's an exception for type names, but > > not otherwise. You're confusing the message case with printing SQL. > > OK. I was unclear if a status _display_ was a message like an error > message. > The attached patch fix missing double-quoted in "dumping contents of table.." message and add schema name to other messages: - "reading indexes for table \"%s\".\"%s\"\n" - "reading foreign key constraints for table \"%s\".\"%s\"\n" - "reading triggers for table \"%s\".\"%s\"\n" - "finding the columns and types of table \"%s\".\"%s\"\n" - "finding default expressions of table \"%s\".\"%s\"\n" - "finding check constraints for table \"%s\".\"%s\"\n" Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 9464540..7f73e8d 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, _becomeOwner(AH, te); _selectOutputSchema(AH, te->namespace); - ahlog(AH, 1, "processing data for table \"%s\"\n", - te->tag); + ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n", + AH->currSchema, te->tag); /* * In parallel restore, if we created the table earlier in diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index a6c0428..78ec5bf 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext) const char *column_list; if (g_verbose) - write_msg(NULL, "dumping contents of table %s\n", classname); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && + tbinfo->dobj.namespace->dobj.name != NULL) + write_msg(NULL, "dumping contents of table \"%s\".\"%s\"\n", + tbinfo->dobj.namespace->dobj.name, + classname); + else + write_msg(NULL, "dumping contents of table \"%s\"\n", + classname); + } /* * Make sure we are in proper schema. We will qualify the table name @@ -4974,8 +4984,17 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) continue; if (g_verbose) - write_msg(NULL, "reading indexes for table \"%s\"\n", - tbinfo->dobj.name); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && +tbinfo->dobj.namespace->dobj.name != NULL) +write_msg(NULL, "reading indexes for table \"%s\".\"%s\"\n", + tbinfo->dobj.namespace->dobj.name, + tbinfo->dobj.name); + else +write_msg(NULL, "reading indexes for table \"%s\"\n", + tbinfo->dobj.name); + } /* Make sure we are in proper schema so indexdef is right */ selectSourceSchema(fout, tbinfo->dobj.namespace->dobj.name); @@ -5340,8 +5359,17 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables) continue; if (g_verbose) - write_msg(NULL, "reading foreign key constraints for table \"%s\"\n", - tbinfo->dobj.name); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && +tbinfo->dobj.namespace->dobj.name != NULL) +write_msg(NULL, "reading foreign key constraints for table \"%s\".\"%s\"\n", + tbinfo->dobj.namespace->dobj.name, + tbinfo->dobj.name); + else +write_msg(NULL, "reading foreign key constraints for table \"%s\"\n", + tbinfo->dobj.name); + } /* * select table schema to ensure constraint expr is qualified if @@ -5678,8 +5706,17 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables) continue; if (g_verbose) - write_msg(NULL, "reading triggers for table \"%s\"\n", - tbinfo->dobj.name); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && +tbinfo->dobj.namespace->dobj.name != NULL) +write_msg(NULL, "reading triggers for table \"%s\".\"%s\"\n", + tbinfo->dobj.namespace->dobj.name, + tbinfo->dobj.name); + else +write_msg(NULL, "reading triggers for table \"%s\"\n", + tbinfo->dobj.name); + } /* * select table schema to ensure regproc name is qualified if needed @@ -6291,8 +6328,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) * the o
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thursday, April 17, 2014, Merlin Moncure wrote: > > yeah -- the thing is, we are already too spendy already on > supplemental write i/o (hint bits, visible bits, freezing, etc) and > likely not worth it to throw something else on the pile unless the > page is already dirty; the medium term trend in storage is that read > vs write performance is becoming increasingly asymmetric, particularly > on the random side so it's very unlikely to balance out. > Guess I wasn't clear but I was thinking to read the page in, not do any writing, and do it in a asynchronous way to the process doing the evicting. Thanks, Stephen
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 2:16 PM, Stephen Frost wrote: > * Merlin Moncure (mmonc...@gmail.com) wrote: >> I don't think this would work unless we would keep some kind of >> tracking information on the page itself which seems not worth a write >> operation to do (maybe if the page is dirtied it could be snuck in >> there though...). IOW, it would only make sense to do this if we knew >> that this page was likely to be read in again. This might be true in >> general on particular workloads but is probably a pretty flimsy >> assumption without supporting evidence; probably better to let the O/S >> deal with it. > > The trouble is that we're ending up "hiding" the information from the OS > about the frequency of utilization of that page. You have a good point > and we wouldn't want to do this for pages that are just accessed once or > similar, but perhaps just mark a page that's reached the 'max' as having > been 'hot' and then, for those pages, advise the OS that while we're > under pressure and need to push this page out, it was once pretty hottly > used and therefore we may want it again soon. > > For pages that never reach the 'max' level, we wouldn't do anything on > the assumption that those were only temporairly needed. yeah -- the thing is, we are already too spendy already on supplemental write i/o (hint bits, visible bits, freezing, etc) and likely not worth it to throw something else on the pile unless the page is already dirty; the medium term trend in storage is that read vs write performance is becoming increasingly asymmetric, particularly on the random side so it's very unlikely to balance out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Tue, Apr 15, 2014 at 4:47 PM, Josh Berkus wrote: > Hackers, > > I think 9.3 has given us evidence that our users aren't giving new > versions of PostgreSQL substantial beta testing, or if they are, they > aren't sharing the results with us. > > How can we make beta testing better and more effective? How can we get > more users to actually throw serious workloads at new versions and share > the results? > > I've tried a couple of things over the last two years and they haven't > worked all that well. Since we're about to go into another beta testing > period, we need something new. Ideas? I've seen lots of bugs reported and fixed in the beta period over the years. My take is that it's basically unrealistic to expect volunteer beta testers to replace bone fide regression testing. I think it's a pretty fair statement that we've had some QC issues in the general area of replication technologies. What this is indicating to me is that replication needs substantially more coverage in 'make check'. Since I'm wishing for things, it would be nice to see an expansion of the buildfarm so that we could [optionally] run various performance tests as well as various replication scenarios. Then we could go back to users and say, please donate 'repeatable tests and machines to run them on' and reap the long term value. Not at all making light out of any of this...it's a huge project. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I wonder if it would help to actually tell the OS to read in buffers > > that we're *evicting*... On the general notion that if the OS already > > has them buffered then it's almost a no-op, and if it doesn't and it's > > actually a 'hot' buffer that we're gonna need again shortly, the OS will > > have it. > > But if it's actually gone cold, you're just forcing unnecessary read I/O, > not to mention possibly causing something slightly warmer to be lost from > kernel cache. Certainly possible- see the email I just sent about another thought around this. Obviously, none of these thoughts are really fully formed solutions and are, instead, just speculation and ideas. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Merlin Moncure (mmonc...@gmail.com) wrote: > I don't think this would work unless we would keep some kind of > tracking information on the page itself which seems not worth a write > operation to do (maybe if the page is dirtied it could be snuck in > there though...). IOW, it would only make sense to do this if we knew > that this page was likely to be read in again. This might be true in > general on particular workloads but is probably a pretty flimsy > assumption without supporting evidence; probably better to let the O/S > deal with it. The trouble is that we're ending up "hiding" the information from the OS about the frequency of utilization of that page. You have a good point and we wouldn't want to do this for pages that are just accessed once or similar, but perhaps just mark a page that's reached the 'max' as having been 'hot' and then, for those pages, advise the OS that while we're under pressure and need to push this page out, it was once pretty hottly used and therefore we may want it again soon. For pages that never reach the 'max' level, we wouldn't do anything on the assumption that those were only temporairly needed. Just some thoughts. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Stephen Frost writes: > I wonder if it would help to actually tell the OS to read in buffers > that we're *evicting*... On the general notion that if the OS already > has them buffered then it's almost a no-op, and if it doesn't and it's > actually a 'hot' buffer that we're gonna need again shortly, the OS will > have it. But if it's actually gone cold, you're just forcing unnecessary read I/O, not to mention possibly causing something slightly warmer to be lost from kernel cache. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 2:00 PM, Stephen Frost wrote: > * Merlin Moncure (mmonc...@gmail.com) wrote: >> I doubt that's necessary though -- if the postgres caching algorithm >> improves such that there is a better tendency for hot pages to stay in >> s_b, Eventually the O/S will deschedule the page for something else >> that needs it. In other words, otherwise preventable double >> buffering is really a measurement of bad eviction policy because it >> manifests in volatility of frequency accessed pages. > > I wonder if it would help to actually tell the OS to read in buffers > that we're *evicting*... On the general notion that if the OS already > has them buffered then it's almost a no-op, and if it doesn't and it's > actually a 'hot' buffer that we're gonna need again shortly, the OS will > have it. > > In other words, try to make the OS more like a secondary cache to ours > by encouraging it to cache things we're evicting. I don't think this would work unless we would keep some kind of tracking information on the page itself which seems not worth a write operation to do (maybe if the page is dirtied it could be snuck in there though...). IOW, it would only make sense to do this if we knew that this page was likely to be read in again. This might be true in general on particular workloads but is probably a pretty flimsy assumption without supporting evidence; probably better to let the O/S deal with it. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Merlin Moncure (mmonc...@gmail.com) wrote: > I doubt that's necessary though -- if the postgres caching algorithm > improves such that there is a better tendency for hot pages to stay in > s_b, Eventually the O/S will deschedule the page for something else > that needs it. In other words, otherwise preventable double > buffering is really a measurement of bad eviction policy because it > manifests in volatility of frequency accessed pages. I wonder if it would help to actually tell the OS to read in buffers that we're *evicting*... On the general notion that if the OS already has them buffered then it's almost a no-op, and if it doesn't and it's actually a 'hot' buffer that we're gonna need again shortly, the OS will have it. In other words, try to make the OS more like a secondary cache to ours by encouraging it to cache things we're evicting. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 11:53 AM, Merlin Moncure wrote: > No. but if you were very judicious, maybe you could hint the o/s > (posix_fadvise) about pages that are likely to stay hot that you don't > need them. Mitsumasa KONDO wrote a patch like that. I don't think the results were that promising, but things change quickly. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 1:48 PM, Andres Freund wrote: > On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote: >> On 04/17/2014 09:38 PM, Stephen Frost wrote: >> >* Greg Stark (st...@mit.edu) wrote: >> >>On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost wrote: >> >>>Ehhh. No. If it's a hot page that we've been holding in *our* cache >> >>>long enough, the kernel will happily evict it as 'cold' from *its* >> >>>cache, leading to... >> >> >> >>This is a whole nother problem. >> >> >> >>It is worrisome that we could be benchmarking the page replacement >> >>algorithm in Postgres and choose a page replacement algorithm that >> >>chooses pages that performs well because it tends to evict pages that >> >>are in the OS cache. And then one day (hopefully not too far off) >> >>we'll fix the double buffering problem and end up with a strange >> >>choice of page replacement algorithm. >> > >> >That's certainly possible but I don't see the double buffering problem >> >going away any time particularly soon and, even if it does, it's likely >> >to either a) mean we're just using the kernel's cache (eg: something w/ >> >mmap, etc), or b) will involve so many other changes that this will end >> >up getting changed anyway. In any case, while I think we should >> >document any such cache management system we employ as having this risk, >> >I don't think we should worry about it terribly much. >> >> Note that if we somehow come up with a page replacement algorithm that tends >> to evict pages that are in the OS cache, we have effectively solved the >> double buffering problem. When a page is cached in both caches, evicting it >> from one of them eliminates the double buffering. Granted, you might prefer >> to evict it from the OS cache instead, and such an algorithm could be bad in >> other ways. But if a page replacement algorithm happens avoid double >> buffering, that's a genuine merit for that algorithm. > > I don't think it's a good idea to try to synchronize algorithms with the > OSs. There's so much change about the caching logic in e.g. linux that > it won't stay effective for very long. No. but if you were very judicious, maybe you could hint the o/s (posix_fadvise) about pages that are likely to stay hot that you don't need them. I doubt that's necessary though -- if the postgres caching algorithm improves such that there is a better tendency for hot pages to stay in s_b, Eventually the O/S will deschedule the page for something else that needs it. In other words, otherwise preventable double buffering is really a measurement of bad eviction policy because it manifests in volatility of frequency accessed pages. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 04/17/2014 09:38 PM, Stephen Frost wrote: * Greg Stark (st...@mit.edu) wrote: On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost wrote: Ehhh. No. If it's a hot page that we've been holding in *our* cache long enough, the kernel will happily evict it as 'cold' from *its* cache, leading to... This is a whole nother problem. It is worrisome that we could be benchmarking the page replacement algorithm in Postgres and choose a page replacement algorithm that chooses pages that performs well because it tends to evict pages that are in the OS cache. And then one day (hopefully not too far off) we'll fix the double buffering problem and end up with a strange choice of page replacement algorithm. That's certainly possible but I don't see the double buffering problem going away any time particularly soon and, even if it does, it's likely to either a) mean we're just using the kernel's cache (eg: something w/ mmap, etc), or b) will involve so many other changes that this will end up getting changed anyway. In any case, while I think we should document any such cache management system we employ as having this risk, I don't think we should worry about it terribly much. Note that if we somehow come up with a page replacement algorithm that tends to evict pages that are in the OS cache, we have effectively solved the double buffering problem. When a page is cached in both caches, evicting it from one of them eliminates the double buffering. Granted, you might prefer to evict it from the OS cache instead, and such an algorithm could be bad in other ways. But if a page replacement algorithm happens avoid double buffering, that's a genuine merit for that algorithm. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Andres Freund (and...@2ndquadrant.com) wrote: > > Note that if we somehow come up with a page replacement algorithm that tends > > to evict pages that are in the OS cache, we have effectively solved the > > double buffering problem. When a page is cached in both caches, evicting it > > from one of them eliminates the double buffering. Granted, you might prefer > > to evict it from the OS cache instead, and such an algorithm could be bad in > > other ways. But if a page replacement algorithm happens avoid double > > buffering, that's a genuine merit for that algorithm. > > I don't think it's a good idea to try to synchronize algorithms with the > OSs. There's so much change about the caching logic in e.g. linux that > it won't stay effective for very long. There's also more than one OS... Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-17 21:44:47 +0300, Heikki Linnakangas wrote: > On 04/17/2014 09:38 PM, Stephen Frost wrote: > >* Greg Stark (st...@mit.edu) wrote: > >>On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost wrote: > >>>Ehhh. No. If it's a hot page that we've been holding in *our* cache > >>>long enough, the kernel will happily evict it as 'cold' from *its* > >>>cache, leading to... > >> > >>This is a whole nother problem. > >> > >>It is worrisome that we could be benchmarking the page replacement > >>algorithm in Postgres and choose a page replacement algorithm that > >>chooses pages that performs well because it tends to evict pages that > >>are in the OS cache. And then one day (hopefully not too far off) > >>we'll fix the double buffering problem and end up with a strange > >>choice of page replacement algorithm. > > > >That's certainly possible but I don't see the double buffering problem > >going away any time particularly soon and, even if it does, it's likely > >to either a) mean we're just using the kernel's cache (eg: something w/ > >mmap, etc), or b) will involve so many other changes that this will end > >up getting changed anyway. In any case, while I think we should > >document any such cache management system we employ as having this risk, > >I don't think we should worry about it terribly much. > > Note that if we somehow come up with a page replacement algorithm that tends > to evict pages that are in the OS cache, we have effectively solved the > double buffering problem. When a page is cached in both caches, evicting it > from one of them eliminates the double buffering. Granted, you might prefer > to evict it from the OS cache instead, and such an algorithm could be bad in > other ways. But if a page replacement algorithm happens avoid double > buffering, that's a genuine merit for that algorithm. I don't think it's a good idea to try to synchronize algorithms with the OSs. There's so much change about the caching logic in e.g. linux that it won't stay effective for very long. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Thu, Apr 17, 2014 at 9:52 AM, Bruce Momjian wrote: > Where are we on this? My hope is that I can get agreement on a way forward during pgCon. Or, at the very least, explain the issues as I see them in a relatively accessible and succinct way to those interested. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Greg Stark (st...@mit.edu) wrote: > On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost wrote: > > Ehhh. No. If it's a hot page that we've been holding in *our* cache > > long enough, the kernel will happily evict it as 'cold' from *its* > > cache, leading to... > > This is a whole nother problem. > > It is worrisome that we could be benchmarking the page replacement > algorithm in Postgres and choose a page replacement algorithm that > chooses pages that performs well because it tends to evict pages that > are in the OS cache. And then one day (hopefully not too far off) > we'll fix the double buffering problem and end up with a strange > choice of page replacement algorithm. That's certainly possible but I don't see the double buffering problem going away any time particularly soon and, even if it does, it's likely to either a) mean we're just using the kernel's cache (eg: something w/ mmap, etc), or b) will involve so many other changes that this will end up getting changed anyway. In any case, while I think we should document any such cache management system we employ as having this risk, I don't think we should worry about it terribly much. > It also means that every benchmark is super sensitive to the how large > a fraction of system memory Postgres is managing. If A benchmark of a > page replacement algorithm with 3GB shared buffers might perform well > compared to others on a system with 8GB or 32GB total RAM but actually > be choosing pages very poorly in normal terms and perform terribly on > a system with 4GB total ram. I'm not following you here- benchmarks are already sensitive to how much of the system's memory PG is managing (and how much ends up being *dedicated* to PG's cache and therefore unavailable for other work). > Ideally what I would like to see is instrumentation of Postgres's > buffer pinning so we can generate various test loads and then just run > the different algorithms on them and measure precisely how many page > evictions it's causing and when how often it's choosing pages that > need to be read in soon after and so on. We shouldn't have to run > Postgres to get these counts at all, just run the algorithm as we read > through a text file (or database table) listing the pages being > accessed. Go for it. I'd love to see that also. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] How can we make beta testing better?
On 04/17/2014 05:39 AM, Greg Stark wrote: > On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor wrote: >> 4) Plays queries from the CSV logs starting from $TIME mimicking actual >> timing and transaction boundaries > > This ^^ > > But I recall a number of previous attempts including plugins for > general load testing systems, what happened to them? > > Honestly if you really want to load test properly though what you > really want to do is deploy a copy of your entire application and feed > it requests simulating user traffic. That results in more accurate > representation and gives you data that's easier to act on. Software is available which can do this. The problem is getting the workload in the first place. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 9:21 AM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> several orders of magnitude more often. That's clearly bad. On >> systems that are not too heavily loaded it doesn't matter too much >> because we just fault the page right back in from the OS pagecache. > > Ehhh. No. If it's a hot page that we've been holding in *our* cache > long enough, the kernel will happily evict it as 'cold' from *its* > cache, leading to... > >> But I've done pgbench runs where such decisions lead to long stalls, >> because the page has to be brought back in from disk, and there's a >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is >> issuing too many I/O requests and makes some of them wait to cool >> things down. > > Exactly this. Yes, I believe that's why this is so effective. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
Heikki Linnakangas writes: > Two things that are not bugs, but I'd like to change just to make this > tool easier to maintain, and to generally clean things up: > 1. When creating a sequence, we first use simple_heap_insert() to insert > the sequence tuple, which creates a WAL record. Then we write a new > sequence RM WAL record about the same thing. The reason is that the WAL > record written by regular heap_insert is bogus for a sequence tuple. > After replaying just the heap insertion, but not the other record, the > page doesn't have the magic value indicating that it's a sequence, i.e. > it's broken as a sequence page. That's OK because we only do this when > creating a new sequence, so if we crash between those two records, the > whole relation is not visible to anyone. Nevertheless, I'd like to fix > that by using PageAddItem directly to insert the tuple, instead of > simple_heap_insert. We have to override the xmin field of the tuple > anyway, and we don't need any of the other services like finding the > insert location, toasting, visibility map or freespace map updates, that > simple_heap_insert() provides. > 2. _bt_restore_page, when restoring a B-tree page split record. It adds > tuples to the page in reverse order compared to how it's done in master. > There is a comment noting that, and it asks "Is it worth changing just > on general principles?". Yes, I think it is. > Any objections to changing those two? Not here. I've always suspected #2 was going to bite us someday anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL replay bugs
On 04/08/2014 06:41 AM, Michael Paquier wrote: On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas wrote: I've been playing with a little hack that records a before and after image of every page modification that is WAL-logged, and writes the images to a file along with the LSN of the corresponding WAL record. I set up a master-standby replication with that hack in place in both servers, and ran the regression suite. Then I compared the after images after every WAL record, as written on master, and as replayed by the standby. Assuming that adding some dedicated hooks in the core able to do actions before and after a page modification occur is not *that* costly (well I imagine that it is not acceptable in terms of performance), could it be possible to get that in the shape of a extension that could be used to test WAL record consistency? This may be an idea to think about... Yeah, working on it. It can live as a patch set if nothing else. This has been very fruitful, I just committed another fix for a bug I found with this earlier today. There are quite a few things that cause differences between master and standby. We have hint bits in many places, unused space that isn't zeroed etc. Two things that are not bugs, but I'd like to change just to make this tool easier to maintain, and to generally clean things up: 1. When creating a sequence, we first use simple_heap_insert() to insert the sequence tuple, which creates a WAL record. Then we write a new sequence RM WAL record about the same thing. The reason is that the WAL record written by regular heap_insert is bogus for a sequence tuple. After replaying just the heap insertion, but not the other record, the page doesn't have the magic value indicating that it's a sequence, i.e. it's broken as a sequence page. That's OK because we only do this when creating a new sequence, so if we crash between those two records, the whole relation is not visible to anyone. Nevertheless, I'd like to fix that by using PageAddItem directly to insert the tuple, instead of simple_heap_insert. We have to override the xmin field of the tuple anyway, and we don't need any of the other services like finding the insert location, toasting, visibility map or freespace map updates, that simple_heap_insert() provides. 2. _bt_restore_page, when restoring a B-tree page split record. It adds tuples to the page in reverse order compared to how it's done in master. There is a comment noting that, and it asks "Is it worth changing just on general principles?". Yes, I think it is. Any objections to changing those two? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 12:21 PM, Stephen Frost wrote: > Ehhh. No. If it's a hot page that we've been holding in *our* cache > long enough, the kernel will happily evict it as 'cold' from *its* > cache, leading to... This is a whole nother problem. It is worrisome that we could be benchmarking the page replacement algorithm in Postgres and choose a page replacement algorithm that chooses pages that performs well because it tends to evict pages that are in the OS cache. And then one day (hopefully not too far off) we'll fix the double buffering problem and end up with a strange choice of page replacement algorithm. It also means that every benchmark is super sensitive to the how large a fraction of system memory Postgres is managing. If A benchmark of a page replacement algorithm with 3GB shared buffers might perform well compared to others on a system with 8GB or 32GB total RAM but actually be choosing pages very poorly in normal terms and perform terribly on a system with 4GB total ram. Ideally what I would like to see is instrumentation of Postgres's buffer pinning so we can generate various test loads and then just run the different algorithms on them and measure precisely how many page evictions it's causing and when how often it's choosing pages that need to be read in soon after and so on. We shouldn't have to run Postgres to get these counts at all, just run the algorithm as we read through a text file (or database table) listing the pages being accessed. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Feb 10, 2014 at 06:40:30PM +, Peter Geoghegan wrote: > On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan wrote: > > I'm just throwing an error when locking the tuple returns > > HeapTupleInvisible, and the xmin of the tuple is our xid. > > I would like some feedback on this point. We need to consider how > exactly to avoid updating the same tuple inserted by our command. > Updating a tuple we inserted cannot be allowed to happen, not least > because to do so causes livelock. > > A related consideration that I raised in mid to late January that > hasn't been commented on is avoiding updating the same tuple twice, > and where we come down on that with respect to where our > responsibility to the user starts and ends. For example, SQL MERGE > officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE > seems not to, probably due to implementation considerations. Where are we on this? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix memset usage in pgcrypto
On Mon, Mar 31, 2014 at 09:03:41PM -0400, Bruce Momjian wrote: > On Thu, Dec 26, 2013 at 03:42:12PM +0200, Marko Kreen wrote: > > http://www.viva64.com/en/b/0227/ reported that on-stack memset()s > > might be optimized away by compilers. Fix it. > > > > * Replace memset() with px_memset() > > * Add px_memset to copy_crlf() > > * ADd px_memset to pgp-s2k.c > > Where are we on this patch? Seems it needs backpatching too. Patch backpatched through 8.4. Thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
* Robert Haas (robertmh...@gmail.com) wrote: > several orders of magnitude more often. That's clearly bad. On > systems that are not too heavily loaded it doesn't matter too much > because we just fault the page right back in from the OS pagecache. Ehhh. No. If it's a hot page that we've been holding in *our* cache long enough, the kernel will happily evict it as 'cold' from *its* cache, leading to... > But I've done pgbench runs where such decisions lead to long stalls, > because the page has to be brought back in from disk, and there's a > long I/O queue; or maybe just because the kernel thinks PostgreSQL is > issuing too many I/O requests and makes some of them wait to cool > things down. Exactly this. > Of course, the overhead of repeated clock sweeps to push down the > usage counts isn't a great thing either. I'm not saying that isn't a > problem. But I think bad decisions about what to evict are also a > problem. Using a bit more CPU here and there, particularly if it's done in a background worker, or ideally multiple background workers (for each buffer pool) would be much better than evicting a hot page that isn't in the kernel's buffer either 'cause we've held on to it long enough that the kernel thinks it's cold. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?
On Wed, Apr 16, 2014 at 06:55:14PM -0400, Bruce Momjian wrote: > > Seems reasonable. It could lead to quite a bit of log spam, I > > suppose, but the way things are now could be pretty mystifying if > > you've located your trigger file somewhere outside $PGDATA, and a > > parent directory is lacking permissions. > > > > > > +1. Since it actually indicates something that's quite broken (since with > > that > > you can never make the trigger work until you fix it), the log spam seems > > like > > it would be appropriate. (Logspam is never nice, but a single log line is > > also > > very easy to miss - this should log enough that you wouldn't) > > I have developed the attached patch to address this issue. Patch applied. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 11:44:37AM -0400, Tom Lane wrote: > Bruce Momjian writes: > > The idea is that we only need quotes when there are odd characters in > > the identifier. We do that right now in some places, though I can't > > find them in pg_dump. I know psql does that, see quote_ident(). > > I think our general style rule is that identifiers embedded in messages > are always double-quoted. There's an exception for type names, but > not otherwise. You're confusing the message case with printing SQL. OK. I was unclear if a status _display_ was a message like an error message. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
Bruce Momjian writes: > The idea is that we only need quotes when there are odd characters in > the identifier. We do that right now in some places, though I can't > find them in pg_dump. I know psql does that, see quote_ident(). I think our general style rule is that identifiers embedded in messages are always double-quoted. There's an exception for type names, but not otherwise. You're confusing the message case with printing SQL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade & tablespaces
On Wed, Apr 16, 2014 at 01:49:20PM -0400, Bruce Momjian wrote: > On Sun, Jan 12, 2014 at 11:04:41PM -0500, Bruce Momjian wrote: > > > In the pgsql_old installation you have symlinks pointing back to the > > > current default location. As well pg_tablespace points back to > > > /usr/local/pgsql/data/ The issue is that there is not actually > > > anything there in the way of a tablespace. So when pg_upgrade runs > > > it tries to upgrade from /usr/local/pgsql/data/tblspc_dir to > > > /usr/local/pgsql/data/tblspc_dir where the first directory either > > > does not exist. or if the user went ahead and created the directory > > > in the new installation, is empty. What is really wanted is to > > > upgrade from /usr/local/pgsql_old/data/tblspc_dir to > > > /usr/local/pgsql/data/tblspc_dir. Right now the only way that > > > happens is with user intervention. > > > > Right, it points to _nothing_ in the _new_ cluster. Perhaps the > > simplest approach would be to check all the pg_tablespace locations to > > see if they point at real directories. If not, we would have to have > > the user update pg_tablespace and the symlinks. :-( Actually, even in > > 9.2+, those symlinks are going to point at the same "nothing". That > > would support checking the symlinks in all versions. > > I have developed the attached patch which checks all tablespaces to make > sure the directories exist. I plan to backpatch this. > > The reason we haven't seen this bug reported more frequently is that a > _database_ defined in a non-existent tablespace directory already throws > an backend error, so this check is only necessary where tables/indexes > (not databases) are defined in non-existant tablespace directories. Patch applied and backpatched to 9.3. I beefed up the C comment to explain how this can happen: Check that the tablespace path exists and is a directory. Effectively, this is checking only for tables/indexes in non-existent tablespace directories. Databases located in non-existent tablespaces already throw a backend error. Non-existent tablespace directories can occur when a data directory that contains user tablespaces is moved as part of pg_upgrade preparation and the symbolic links are not updated. Thanks for the report and debugging. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 12:07:39PM -0300, Fabrízio de Royes Mello wrote: > > Can you get that to _conditionally_ double-quote the strings? > > Sorry, I didn't understand what you means? Your idea is to check if the > namespace is available and then don't show the double-quote, is that? The idea is that we only need quotes when there are odd characters in the identifier. We do that right now in some places, though I can't find them in pg_dump. I know psql does that, see quote_ident(). > > In fact, > > maybe we don't even need the double-quotes. How do we double-quote > > other places? > > > > Checking that more deeply I found some other places that show the table name > and all of them are double-quoted. OK. > Just the "dumping contents of table.." message isn't double-quoted: > > $ grep 'table %s' src/bin/pg_dump/*.c > src/bin/pg_dump/pg_dump.c: write_msg(NULL, "dumping contents of > table %s\n", > > > So maybe we must double-quote of all string, i.e. "public.foo", including the > missing bellow. No, I think double-quoting each part is the correct way. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 7:30 PM, Peter Geoghegan wrote: > Frankly, there doesn't need to be any research on this, because it's > just common sense that probabilistically, leaf pages are much more > useful than heap pages in servicing index scan queries if we assume a > uniform distribution. If we don't assume that, then they're still more > useful on average. I don't think "common sense" is compelling. I think you need to pin down exactly what it is about btree intermediate pages that the LRU isn't capturing and not just argue they're more useful. The LRU is already capturing which pages are more heavily used than others so you need to identify what it is that makes index pages *even more* useful than their frequency and recency of access indicates. Not just that they're more useful than an average page. So what I think is missing is that indexes are always accessed from the root down to the leaf. So the most recent page accessed will always be the leaf. And in whatever chain of pages was used to reach the last leaf page the least recently accessed will always be the root. But we'll need the root page again on the subsequent descent even if it's to reach the same leaf page we kept in ram in preference to it. Now it doesn't *always* make sense to keep an intermediate page over leaf pages. Imagine an index that we always do full traversals of. We'll always descend from the root down the left-most pages and then follow the right pointers across. All the other intermediate pages will be cold. If we do an occasional descent probing for other keys those leaf pages shouldn't be cached since they won't be needed again for the common full index traversals and the next occasional probe will probably be looking for different keys. But if we're often probing for the same keys the last thing we want to do is throw away one of the intermediate pages for those keys when we could throw away a leaf page. But that's what would happen in a strict LRU. It's almost like what we would really want to do is mark the pages as least recently used in the opposite order from the order they're actually accessed when descending. Or perhaps bump the usage count to max+1 when it's an intermediate page so that it takes one extra cycle of decrementing before it's considered old compared to a leaf page. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 11:36 AM, Bruce Momjian wrote: > > On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote: > > > > This database have a lot of different schemas with same structure and if I > > > > need do view the status of dump I don't know what schema the table are dump > > > > from. > > > Yes this may be helpful. The attached quick'n dirty patch implements it. > > > > > > > Very nice... thanks!!! > > > > I add schema name do the following messages too: > > > > pg_restore: processing data for table "public"."bar" > > pg_restore: processing data for table "public"."foo" > > pg_restore: processing data for table "s1"."bar" > > pg_restore: processing data for table "s1"."foo" > > pg_restore: processing data for table "s2"."bar" > > pg_restore: processing data for table "s2"."foo" > > pg_restore: processing data for table "s3"."bar" > > pg_restore: processing data for table "s3"."foo" > > Can you get that to _conditionally_ double-quote the strings? Sorry, I didn't understand what you means? Your idea is to check if the namespace is available and then don't show the double-quote, is that? > In fact, > maybe we don't even need the double-quotes. How do we double-quote > other places? > Checking that more deeply I found some other places that show the table name and all of them are double-quoted. $ grep 'table \\\"%s' src/bin/pg_dump/*.c src/bin/pg_dump/common.c:write_msg(NULL, "failed sanity check, parent OID %u of table \"%s\" (OID %u) not found\n", src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n", src/bin/pg_dump/pg_backup_archiver.c:ahlog(AH, 1, "table \"%s\" could not be created, will not restore its data\n", src/bin/pg_dump/pg_backup_db.c:warn_or_exit_horribly(AH, modulename, "COPY failed for table \"%s\": %s", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "Dumping the contents of table \"%s\" failed: PQgetCopyData() failed.\n", classname); src/bin/pg_dump/pg_dump.c:write_msg(NULL, "Dumping the contents of table \"%s\" failed: PQgetResult() failed.\n", classname); src/bin/pg_dump/pg_dump.c:write_msg(NULL, "WARNING: owner of table \"%s\" appears to be invalid\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading indexes for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading foreign key constraints for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "reading triggers for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:exit_horribly(NULL, "query produced null referenced table name for foreign key trigger \"%s\" on table \"%s\" (OID of table: %u)\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding the columns and types of table \"%s\".\"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding the columns and types of table \"%s\"\n", src/bin/pg_dump/pg_dump.c: "invalid column numbering in table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding default expressions of table \"%s\"\n", src/bin/pg_dump/pg_dump.c: "invalid adnum value %d for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "finding check constraints for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, ngettext("expected %d check constraint on table \"%s\" but found %d\n", src/bin/pg_dump/pg_dump.c: "expected %d check constraints on table \"%s\" but found %d\n", src/bin/pg_dump/pg_dump.c:exit_horribly(NULL, "invalid column number %d for table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "invalid argument string (%s) for trigger \"%s\" on table \"%s\"\n", src/bin/pg_dump/pg_dump.c:write_msg(NULL, "query to get rule \"%s\" for table \"%s\" failed: wrong number of rows returned\n", Just the "dumping contents of table.." message isn't double-quoted: $ grep 'table %s' src/bin/pg_dump/*.c src/bin/pg_dump/pg_dump.c:write_msg(NULL, "dumping contents of table %s\n", So maybe we must double-quote of all string, i.e. "public.foo", including the missing bellow. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 10:18 AM, Robert Haas wrote: > Because all the usage counts are the same, the eviction at > this point is completely indiscriminate. We're just as likely to kick > out a btree root page or a visibility map page as we are to kick out a > random heap page, even though the former have probably been accessed > several orders of magnitude more often. That's clearly bad. That's not clear at all. In that circumstance regardless of what page you evict you're incurring precisely one page fault i/o when the page is read back in. Incurring that i/o is bad but it's unavoidable and it's the same badness regardless of what page it's for. The only way to prefer one page over another is if one page won't be needed for long enough for the page to be useful for caching this new buffer (or mixture of buffers) for multiple accesses. If you can't do that then it doesn't matter which buffer you use since it'll just be evicted to read back in the original page again. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-17 10:48:15 -0400, Bruce Momjian wrote: > On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote: > > That can happen, but the real problem I was trying to get at is that > > when all the buffers get up to max usage count, they all appear > > equally important. But in reality they're not. So when we do start > > evicting those long-resident buffers, it's essentially random which > > one we kick out. > > True. Ideally we would have some way to know that _all_ the buffers had > reached the maximum and kick off a sweep to decrement them all. I am > unclear how we would do that. One odd idea would be to have a global > counter that is incremented everytime a buffer goes from 4 to 5 (max) > --- when the counter equals 50% of all buffers, do a clock sweep. Of > course, then the counter becomes a bottleneck. I have my doubts that we'll make the current scheme, where buffer reclaim essentially is O(NBuffers), work much better. Especially as CPU cache effects make such large, high frequency, accesses really expensive. I think we need more drastic changes. I am *not* suggesting that we do that, but I believe it'd be possible to implement a full LRU and be faster than today in scenarios with nontrivial amounts of shared buffers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 10:48 AM, Bruce Momjian wrote: >> > I understand now. If there is no memory pressure, every buffer gets the >> > max usage count, and when a new buffer comes in, it isn't the max so it >> > is swiftly removed until the clock sweep has time to decrement the old >> > buffers. Decaying buffers when there is no memory pressure creates >> > additional overhead and gets into timing issues of when to decay. >> >> That can happen, but the real problem I was trying to get at is that >> when all the buffers get up to max usage count, they all appear >> equally important. But in reality they're not. So when we do start >> evicting those long-resident buffers, it's essentially random which >> one we kick out. > > True. Ideally we would have some way to know that _all_ the buffers had > reached the maximum and kick off a sweep to decrement them all. I am > unclear how we would do that. One odd idea would be to have a global > counter that is incremented everytime a buffer goes from 4 to 5 (max) > --- when the counter equals 50% of all buffers, do a clock sweep. Of > course, then the counter becomes a bottleneck. Yeah, I think that's the right general line of thinking. But it doesn't have to be as coarse-grained as "do a whole clock sweep". It can be, you know, for every buffer that gets incremented from 4 to 5, run the clock sweep far enough to decrement the usage count of some other buffer by one. That's similar to your idea but you can do it a bit at a time rather than having to make a complete pass over shared_buffers all at once. Your other point, that the counter can become the bottleneck, is quite right also and a major problem in this area. I don't know how to solve it right at the moment, but I'm hopeful that there may be a way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote: > On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian wrote: > > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote: > >> I also believe this to be the case on first principles and my own > >> experiments. Suppose you have a workload that fits inside > >> shared_buffers. All of the usage counts will converge to 5. Then, > >> somebody accesses a table that is not cached, so something's got to be > >> evicted. Because all the usage counts are the same, the eviction at > >> this point is completely indiscriminate. We're just as likely to kick > >> out a btree root page or a visibility map page as we are to kick out a > >> random heap page, even though the former have probably been accessed > >> several orders of magnitude more often. That's clearly bad. On > >> systems that are not too heavily loaded it doesn't matter too much > >> because we just fault the page right back in from the OS pagecache. > >> But I've done pgbench runs where such decisions lead to long stalls, > >> because the page has to be brought back in from disk, and there's a > >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is > >> issuing too many I/O requests and makes some of them wait to cool > >> things down. > > > > I understand now. If there is no memory pressure, every buffer gets the > > max usage count, and when a new buffer comes in, it isn't the max so it > > is swiftly removed until the clock sweep has time to decrement the old > > buffers. Decaying buffers when there is no memory pressure creates > > additional overhead and gets into timing issues of when to decay. > > That can happen, but the real problem I was trying to get at is that > when all the buffers get up to max usage count, they all appear > equally important. But in reality they're not. So when we do start > evicting those long-resident buffers, it's essentially random which > one we kick out. True. Ideally we would have some way to know that _all_ the buffers had reached the maximum and kick off a sweep to decrement them all. I am unclear how we would do that. One odd idea would be to have a global counter that is incremented everytime a buffer goes from 4 to 5 (max) --- when the counter equals 50% of all buffers, do a clock sweep. Of course, then the counter becomes a bottleneck. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian wrote: > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote: >> I also believe this to be the case on first principles and my own >> experiments. Suppose you have a workload that fits inside >> shared_buffers. All of the usage counts will converge to 5. Then, >> somebody accesses a table that is not cached, so something's got to be >> evicted. Because all the usage counts are the same, the eviction at >> this point is completely indiscriminate. We're just as likely to kick >> out a btree root page or a visibility map page as we are to kick out a >> random heap page, even though the former have probably been accessed >> several orders of magnitude more often. That's clearly bad. On >> systems that are not too heavily loaded it doesn't matter too much >> because we just fault the page right back in from the OS pagecache. >> But I've done pgbench runs where such decisions lead to long stalls, >> because the page has to be brought back in from disk, and there's a >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is >> issuing too many I/O requests and makes some of them wait to cool >> things down. > > I understand now. If there is no memory pressure, every buffer gets the > max usage count, and when a new buffer comes in, it isn't the max so it > is swiftly removed until the clock sweep has time to decrement the old > buffers. Decaying buffers when there is no memory pressure creates > additional overhead and gets into timing issues of when to decay. That can happen, but the real problem I was trying to get at is that when all the buffers get up to max usage count, they all appear equally important. But in reality they're not. So when we do start evicting those long-resident buffers, it's essentially random which one we kick out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm "master-next" branch?
Andrew Dunstan writes: > On 04/17/2014 09:17 AM, Robert Haas wrote: >> In terms of improving the buildfarm infrastructure, the thing I would >> most like to have is more frequent runs. IMO the best single thing that could happen for the buildfarm is if we had more critters (at least twice as many) running a wider variety of platforms, compilers, and configuration options than there are today. More frequent runs would come out of that automatically. >> ... But that would require more resources for the >> buildfarm machines, which are provided on a strictly volunteer basis, >> so it's hard to see how to arrange that. I don't think we've tried hard lately to get people to sign up. Maybe we should ask the -advocacy crew to do something. >> But the ability to easily spin up temporary branches for testing would >> also be great. Unfortunately, I suspect that only a minority of the >> buildfarm owners would choose to participate, which would make it less >> useful, but if we could solve that problem I'd be all in favor of it. > ... Of course, all this would be done in my copious spare time *cough*. I'm > not sure this would be the best use of it. I agree that this would not be worth the effort needed to make it happen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 11:29:03AM -0300, Fabrízio de Royes Mello wrote: > > > This database have a lot of different schemas with same structure and if I > > > need do view the status of dump I don't know what schema the table are > > > dump > > > from. > > Yes this may be helpful. The attached quick'n dirty patch implements it. > > > > Very nice... thanks!!! > > I add schema name do the following messages too: > > pg_restore: processing data for table "public"."bar" > pg_restore: processing data for table "public"."foo" > pg_restore: processing data for table "s1"."bar" > pg_restore: processing data for table "s1"."foo" > pg_restore: processing data for table "s2"."bar" > pg_restore: processing data for table "s2"."foo" > pg_restore: processing data for table "s3"."bar" > pg_restore: processing data for table "s3"."foo" Can you get that to _conditionally_ double-quote the strings? In fact, maybe we don't even need the double-quotes. How do we double-quote other places? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote: > I also believe this to be the case on first principles and my own > experiments. Suppose you have a workload that fits inside > shared_buffers. All of the usage counts will converge to 5. Then, > somebody accesses a table that is not cached, so something's got to be > evicted. Because all the usage counts are the same, the eviction at > this point is completely indiscriminate. We're just as likely to kick > out a btree root page or a visibility map page as we are to kick out a > random heap page, even though the former have probably been accessed > several orders of magnitude more often. That's clearly bad. On > systems that are not too heavily loaded it doesn't matter too much > because we just fault the page right back in from the OS pagecache. > But I've done pgbench runs where such decisions lead to long stalls, > because the page has to be brought back in from disk, and there's a > long I/O queue; or maybe just because the kernel thinks PostgreSQL is > issuing too many I/O requests and makes some of them wait to cool > things down. I understand now. If there is no memory pressure, every buffer gets the max usage count, and when a new buffer comes in, it isn't the max so it is swiftly removed until the clock sweep has time to decrement the old buffers. Decaying buffers when there is no memory pressure creates additional overhead and gets into timing issues of when to decay. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Verbose output of pg_dump not show schema name
On Thu, Apr 17, 2014 at 2:14 AM, Michael Paquier wrote: > > On Thu, Apr 17, 2014 at 11:41 AM, Fabrízio de Royes Mello > wrote: > > Hi all, > > > > There are some reason to verbose output of pg_dump don't show schema name? > > > > A output example of using "pg_dump -Fd -j8 -v" > Specifying a target directory with "-f" is better here... > Yeah... I'm just show the relevant options used... ;-) > > This database have a lot of different schemas with same structure and if I > > need do view the status of dump I don't know what schema the table are dump > > from. > Yes this may be helpful. The attached quick'n dirty patch implements it. > Very nice... thanks!!! I add schema name do the following messages too: pg_restore: processing data for table "public"."bar" pg_restore: processing data for table "public"."foo" pg_restore: processing data for table "s1"."bar" pg_restore: processing data for table "s1"."foo" pg_restore: processing data for table "s2"."bar" pg_restore: processing data for table "s2"."foo" pg_restore: processing data for table "s3"."bar" pg_restore: processing data for table "s3"."foo" And: pg_dump: finding the columns and types of table "s1"."foo" pg_dump: finding the columns and types of table "s1"."bar" pg_dump: finding the columns and types of table "s2"."foo" pg_dump: finding the columns and types of table "s2"."bar" pg_dump: finding the columns and types of table "s3"."foo" pg_dump: finding the columns and types of table "s3"."bar" pg_dump: finding the columns and types of table "public"."foo" pg_dump: finding the columns and types of table "public"."bar" And: pg_dump: processing data for table "public"."bar" pg_dump: dumping contents of table public.bar pg_dump: processing data for table "public"."foo" pg_dump: dumping contents of table public.foo pg_dump: processing data for table "s1"."bar" pg_dump: dumping contents of table s1.bar pg_dump: processing data for table "s1"."foo" pg_dump: dumping contents of table s1.foo pg_dump: processing data for table "s2"."bar" pg_dump: dumping contents of table s2.bar pg_dump: processing data for table "s2"."foo" pg_dump: dumping contents of table s2.foo pg_dump: processing data for table "s3"."bar" pg_dump: dumping contents of table s3.bar pg_dump: processing data for table "s3"."foo" pg_dump: dumping contents of table s3.foo Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog sobre TI: http://fabriziomello.blogspot.com >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index 9464540..7f73e8d 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -706,8 +706,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, _becomeOwner(AH, te); _selectOutputSchema(AH, te->namespace); - ahlog(AH, 1, "processing data for table \"%s\"\n", - te->tag); + ahlog(AH, 1, "processing data for table \"%s\".\"%s\"\n", + AH->currSchema, te->tag); /* * In parallel restore, if we created the table earlier in diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index a6c0428..f763a88 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -1400,7 +1400,17 @@ dumpTableData_copy(Archive *fout, void *dcontext) const char *column_list; if (g_verbose) - write_msg(NULL, "dumping contents of table %s\n", classname); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && + tbinfo->dobj.namespace->dobj.name != NULL) + write_msg(NULL, "dumping contents of table %s.%s\n", + tbinfo->dobj.namespace->dobj.name, + classname); + else + write_msg(NULL, "dumping contents of table %s\n", + classname); + } /* * Make sure we are in proper schema. We will qualify the table name @@ -6291,8 +6301,17 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables) * the output of an indexscan on pg_attribute_relid_attnum_index. */ if (g_verbose) - write_msg(NULL, "finding the columns and types of table \"%s\"\n", - tbinfo->dobj.name); + { + /* Print namespace information if available */ + if (tbinfo->dobj.namespace != NULL && +tbinfo->dobj.namespace->dobj.name != NULL) +write_msg(NULL, "finding the columns and types of table \"%s\".\"%s\"\n", + tbinfo->dobj.namespace->dobj.name, + tbinfo->dobj.name); + else +write_msg(NULL, "finding the columns and types of table \"%s\"\n", + tbinfo->dobj.name); + } resetPQExpBuffer(q); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm "master-next" branch?
On 04/17/2014 09:17 AM, Robert Haas wrote: In terms of improving the buildfarm infrastructure, the thing I would most like to have is more frequent runs. It would be great if pushing a commit to the master repository triggered an immediate build on every buildfarm animal so that you could see all of the failures in a short period of time. But that would require more resources for the buildfarm machines, which are provided on a strictly volunteer basis, so it's hard to see how to arrange that. Some buildfarm owners run at pretty high frequency - I know there are cron jobs on some running every 15 minutes. My own Linux and FBSD machines run every hour. Windows builds take longer - depending on other use of resources they can run a couple of hours per branch. Also my two Windows machines doing buildfarm work are running a total of 5 animals, so the runs are staggered - on Windows 8 the two animals each run every 3 hours. Note that each run potentially builds all the branches, if there has been some backported change, and the windows animals are set up so that if animal A on the same machine is running when animal B's run time comes around animal B skips it scheduled run. So sometimes you do have to wait a bit. If someone were to providfe me with a bunch of nice fast Windows VMs I would set them up with one animal a piece with frequent runs and we might get a lot better coverage. But I am tapped out as far as the resources I can provide go. But the ability to easily spin up temporary branches for testing would also be great. Unfortunately, I suspect that only a minority of the buildfarm owners would choose to participate, which would make it less useful, but if we could solve that problem I'd be all in favor of it. I'm not volunteering to do the work, though. The buildfarm's original purpose was to give early warning of platform-specific problems of code we had *already* decided on. Now projects morph, so we might decide to do something like this. But we'd need to think long and hard about it. Postgres has not historically used short-lived branches. I don't much like Craig's idea of a long-lived testing branch that we're going to do commits and reverts on. If we're going to do something like this it would be much better to make some provision for short-lived topic branches. e.g. say we allowed branches with names like testme-platformname-featurename, (platformname here could be a magic "all", or a comma-separated list of names such as linux, freebsd, windows). Wnen testing is done, we could merge the branch if the testing worked out OK, or drop it if the testing proved to be a failure. There would be some work to make the buildfarm client suitable for this. And we'd probably need a "testing dashboard" so as to keep the main dashboard page free of test branch results. Of course, all this would be done in my copious spare time *cough*. I'm not sure this would be the best use of it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Thu, Apr 17, 2014 at 9:40 AM, Greg Stark wrote: > On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas wrote: >> This isn't a fundamental property of the usage-count idea; it's an >> artifact of the fact that usage count decreases are tied to eviction >> pressure rather than access pressure. For example, suppose we made a >> rule that if the total usage counts of all buffers exceed 3 * >> NBuffers, then every time you bump the usage count of a buffer from N >> to N+1, you're required to advance the clock sweep far enough to >> decrease the reference count of a buffer by one. > > This sounds like the right way to reason about it. > > From what I remember in school the idea with the clock sweep is to set > the usage flags to the maximum whenever the buffer is used and > decrement (actually iirc typically shift right) it when the clock > sweep goes by. Ie, simulate a LRU where when the buffer is accessed it > jumps to the head of the list and when the clock comes by it moves > gradually down the list. > > What you're pointing out is that the clock might not come by very > often resulting everything being at the head of the list. In that case > I'm not clear it really matters what gets evicted though. And the cpu > effort of running the clock n times sounds bad but doing the work > earlier doesn't really change the amount of work being done, it just > amortizes it over more calls. > > But if you want to do that it seems to me the way to do it is every > time a buffer is pinned set to the maximum and then run the clock > max_value - previous_value. So the total usage counts of all buffers > remains constant. If that results in contention one way to reduce it > is to do this probabilistically. Run the clock 1% of the time but run > it 100x as much as you would normally. > > But I think you've misidentified the problem and what those other > algorithms are trying to solve. The problem is not that Postgres will > pick a bad buffer to evict. If all the buffers have been since the > last time the clock came around then they're all "hot" anyways and it > doesn't really matter which one we evict. The problem is that we > expend an inordinate amount of work finding the few non-hot buffers. > When you have a really large amount of memory and 99.9% of it is hot > but 0.1% is whatever random non-hot page was needed last then there's > an obvious buffer to evict when you need a new one. But we spend a lot > of work decrementing every hot buffer's usage count 4 times only to > have them immediately incremented again just to find the 1 buffer > where the usage count was 4 or 3. The goal of these algorithms that > divide the buffers into groups is to avoid having to do so much work > to find the colder buffers. Once the hot buffers migrate to the hot > pool we only need to run the clock there when we find we have new hot > pages that we want to promote. All the thrashing in the cold pool can > be more efficient because there's many fewer pages to consider. Well, I think Peter has proved that PostgreSQL *will* pick a bad buffer to evict. The proof is that when he changed the choice of buffer to evict, he got a significant performance improvement. I also believe this to be the case on first principles and my own experiments. Suppose you have a workload that fits inside shared_buffers. All of the usage counts will converge to 5. Then, somebody accesses a table that is not cached, so something's got to be evicted. Because all the usage counts are the same, the eviction at this point is completely indiscriminate. We're just as likely to kick out a btree root page or a visibility map page as we are to kick out a random heap page, even though the former have probably been accessed several orders of magnitude more often. That's clearly bad. On systems that are not too heavily loaded it doesn't matter too much because we just fault the page right back in from the OS pagecache. But I've done pgbench runs where such decisions lead to long stalls, because the page has to be brought back in from disk, and there's a long I/O queue; or maybe just because the kernel thinks PostgreSQL is issuing too many I/O requests and makes some of them wait to cool things down. Of course, the overhead of repeated clock sweeps to push down the usage counts isn't a great thing either. I'm not saying that isn't a problem. But I think bad decisions about what to evict are also a problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure 9.3.4
On 04/16/2014 10:28 PM, Tom Lane wrote: Andrew Dunstan writes: On 04/16/2014 07:19 PM, Tom Lane wrote: Yeah, it would be real nice to see a self-contained test case for this. Well, that might be hard to put together, but I did try running without pg_stat_statements and auto_explain loaded and the error did not occur. Not sure where that gets us in terms of deciding on a culprit. Could we at least see the exact parameter settings for pg_stat_statements and auto_explain? (And any other GUCs with nondefault values?) Here are all the settings from the run that failed: listen_addresses = '*' port = 5432 fsync = on synchronous_commit = off checkpoint_segments = 128 checkpoint_completion_target = 0.9 shared_buffers = 512MB max_connections = 300 work_mem = 128MB maintenance_work_mem = 32MB effective_cache_size = 16GB effective_io_concurrency = 2 logging_collector = on log_destination = 'stderr' log_filename = 'postgresql-%a.log' log_rotation_size = 0 log_truncate_on_rotation = on log_line_prefix = '%t [%p] %l: ' log_connections = on log_disconnections = on log_statement = 'all' track_activity_query_size = 10240 shared_preload_libraries = 'auto_explain,pg_stat_statements' As you can see, auto_explain's log_min_duration hasn't been set, so it shouldn't be doing anything very much, I should think. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart
On Wed, Apr 16, 2014 at 8:46 PM, Craig Ringer wrote: > On 04/17/2014 12:16 AM, Robert Haas wrote: >> On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer wrote: > >>> - A flag like BGW_UNREGISTER_ON_RESTART; >> >> I would be OK with this, maybe modulo the name. >> >>> - To always unregister dynamic bgws on postmaster shm clear + restart; >> >> I don't particularly care for this. Let's suppose the background >> worker is a long-running daemon, like a PG equivalent of cron. In >> static-background worker land, the admin has to restart the cluster to >> get this going. In dynamic-background worker land, he can load it on >> the fly. But once he gets it up and running, he wants it to stay up >> and running, surviving crashes and everything. That's a big part of >> the value of having a background worker interface in the first place. > > It'd be the job of the extension that provides the bgworker to make sure > that it gets relaunched on postmaster restart. > > I tend to agree though. The problem I'm describing only causes issues > for extensions that launch dynamic bgworkers during extension startup. > > Extensions that launch bgworkers in response to SQL commands and don't > rely on passing state to the bgworker via shared memory shouldn't have > to deal with restarting them on postmaster restart. > >> >>> - A way to list bgws, inspect their BackgroundWorker structs and obtain >>> their handles; or >> >> This is certainly a good idea. >> >>> - A way to idempotently register a bgw only if it doesn't already exist >> >> This is probably a good idea, too. > > I think we need _one_ solution for 9.4, and need it soon. > > The simplest is probably to flush all dynamic bgworkers. But I think > you're probably right - that solves the problem discussed here, but is > likely to make life harder for other use cases. > > A last-minute API addition for bgworker listing/inspection might not be > a great idea - it's too late for it to see much testing and analysis and > it might introduce bigger API problems than it solves. > > Duplicate-free registration might be OK, but there are some questions > around how we'd handle differing parameters, what should be the > determinant for uniquenes, whether we should go for idempotency or > return/raise an error to indicate it already exists, etc. So similar > issue with doing it at the last minute. > > To me, that says "let's add a flag to allow a dynamic bgworker to be > unregistered on postmaster restart". Seems simple and low risk. > > I'll follow up with a proposed patch, then we can spend some quality > shed time on the flag name ;-) I think I can live with that. However, I tend to think that the best solution here is really "don't put try to pass pointers via the BackgroundWorker structure, because it doesn't [ expletive ] work." We've had several instances of that already. When I added support for dynamic background workers, I had to add bgw_library_name and bgw_function_name members to that structure because bgw_main won't work for extension code unless the library is loaded from shared_preload_libraries AND either we're not running under EXEC_BACKEND (i.e. Windows) or the system is kind enough to load the shared library at the same address in both processes, in which case it will accidentally fail to fail. We would have had to give bgw_sighup and bgw_sigterm the same treatment, but since they weren't really necessary in the first place we just ripped them out instead. worker_spi also had a bad case of this disease. It went to elaborate lengths to pass a pointer via bgw_main_arg, but the pointer was to *backend-private memory*, so it was completely broken on EXEC_BACKEND builds. I'm not sure whether it actually failed there, or managed to work just because the background worker backend also ran the _PG_init hook and managed to accidentally place the same data structure at the same address. I fixed that as part of introducing the dynamic background worker facility; now it passes an index instead of a pointer. What you're complaining about here is basically another instance of the same problem. It's not as bad because the main shared memory segment never moves or has any memory freed except after a crash-and-restart cycle, so your suggested plug seems likely to be adequate. But it also requires that you can allocate enough space in shared memory to pass around whatever state you need to carry around, and space in the main shared memory segment is at a premium; storage there also can't be freed. Noah suggested to me a while back that we might do better to change bgw_main_arg to something like char[64] rather than Datum, which would allow passing a reasonable-size payload without having to futz with shared memory. Then, instead of passing a pointer, you can pass the name of a shared memory region to look up and an index into the data structure stored there, or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Co
Re: [HACKERS] four minor proposals for 9.5
Pavel Stehule writes: > We can introduce new feature without hard dependency on CSV format Look, the long and the short of it is that there is not consensus that this measurement is worth creating a new CSV log column for. And from that, there is also not consensus that it's worth putting into log_line_prefix. Therefore, this proposal is dead. Please stop prolonging the argument. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] four minor proposals for 9.5
2014-04-17 7:12 GMT+02:00 Amit Kapila : > On Mon, Apr 14, 2014 at 6:27 PM, Robert Haas > wrote: > > I agree. I don't think the idea of pushing this into the > > log_line_prefix stuff as a one-off is a very good one. Sure, we could > > wedge it in there, but we've got an existing precedent that everything > > that you can get with log_line_prefix also shows up in the CSV output > > file. And it's easy to imagine LOTS more counters that somebody might > > want to have. Time spent planning, time spent executing, time spent > > waiting for disk I/O, time spent returning results to client, and I'm > > sure people will think of many others. I think this will balloon out > > of control if we don't have a more systematic design for this sort of > > thing. > > Can't we think of some infrastructure similar to what is done for > log_duration and log_min_duration_statement? > Current it prints like below: > LOG: duration: 343.000 ms statement: create table t1(c1 int); > > Let us say if user wants to track lock wait time a statement has > spent, then enable some config parameter (either log_lock_duration > or some other convenient way) > > LOG: lock duration: 'x' ms statement: create table t1(c1 int); > isn't it log_line_prefix analogy? We can introduce new feature without hard dependency on CSV format I am thinking so there are clean requests: simply parseable - vector of numbers is ideal simply activated, deactivated - maybe list of flags in GUC Regards Pavel > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com >
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas wrote: > This isn't a fundamental property of the usage-count idea; it's an > artifact of the fact that usage count decreases are tied to eviction > pressure rather than access pressure. For example, suppose we made a > rule that if the total usage counts of all buffers exceed 3 * > NBuffers, then every time you bump the usage count of a buffer from N > to N+1, you're required to advance the clock sweep far enough to > decrease the reference count of a buffer by one. This sounds like the right way to reason about it. >From what I remember in school the idea with the clock sweep is to set the usage flags to the maximum whenever the buffer is used and decrement (actually iirc typically shift right) it when the clock sweep goes by. Ie, simulate a LRU where when the buffer is accessed it jumps to the head of the list and when the clock comes by it moves gradually down the list. What you're pointing out is that the clock might not come by very often resulting everything being at the head of the list. In that case I'm not clear it really matters what gets evicted though. And the cpu effort of running the clock n times sounds bad but doing the work earlier doesn't really change the amount of work being done, it just amortizes it over more calls. But if you want to do that it seems to me the way to do it is every time a buffer is pinned set to the maximum and then run the clock max_value - previous_value. So the total usage counts of all buffers remains constant. If that results in contention one way to reduce it is to do this probabilistically. Run the clock 1% of the time but run it 100x as much as you would normally. But I think you've misidentified the problem and what those other algorithms are trying to solve. The problem is not that Postgres will pick a bad buffer to evict. If all the buffers have been since the last time the clock came around then they're all "hot" anyways and it doesn't really matter which one we evict. The problem is that we expend an inordinate amount of work finding the few non-hot buffers. When you have a really large amount of memory and 99.9% of it is hot but 0.1% is whatever random non-hot page was needed last then there's an obvious buffer to evict when you need a new one. But we spend a lot of work decrementing every hot buffer's usage count 4 times only to have them immediately incremented again just to find the 1 buffer where the usage count was 4 or 3. The goal of these algorithms that divide the buffers into groups is to avoid having to do so much work to find the colder buffers. Once the hot buffers migrate to the hot pool we only need to run the clock there when we find we have new hot pages that we want to promote. All the thrashing in the cold pool can be more efficient because there's many fewer pages to consider. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor performance improvement in transition to external sort
On Wed, Apr 16, 2014 at 7:38 PM, Bruce Momjian wrote: > On Thu, Apr 10, 2014 at 06:03:15PM +0100, Simon Riggs wrote: >> On 6 February 2014 18:21, Jeff Janes wrote: >> > On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris wrote: >> >> >> >> The attached patch replaces the existing siftup method for heapify with >> >> a siftdown method. Tested with random integers it does 18% fewer >> >> compares and takes 10% less time for the heapify, over the work_mem >> >> range 1024 to 1048576. >> > >> > >> > Thanks for working on this. >> >> +1 >> >> Your patch isn't linked properly from the CF manager though. >> >> If you like patches like this then there's a long(er) list of >> optimizations already proposed previously around sorting. It would be >> good to have someone work through them for external sorts. I believe >> Noah is working on parallel internal sort (as an aside). >> >> There's also an optimization possible for merge joins where we use the >> output of the first sort as an additional filter on the second sort. >> That can help when we're going to join two disjoint tables. > > Where should this be recorded? TODO? Commitfest manager? IIUC, the original patch was withdrawn; any remaining action items should probably go to TODO. I'm not sure which specific idea you're referring to, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
On Wed, Apr 16, 2014 at 4:39 PM, Josh Berkus wrote: >> Hmm, are you sure it's INT_MAX and not 4244967297? Heikki reported >> that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com >> The absolute value is not important; I think that's mostly harmless. I >> don't think applying age() to a multixact value is meaningful, though; >> that's only good for Xids. > > Yeah, I'm sure: > > josh=# select relname, age(relminmxid) from pg_class; > relname |age > -+ > pg_statistic| 2147483647 > pg_type | 2147483647 > random | 2147483647 > dblink_pkey_results | 2147483647 > pg_toast_17395 | 2147483647 > > ... > > So if age() doesn't mean anything, then how are users to know when the > need to freeze? Or, in other words, this is another example of xid-freezing infrastructure that needed to be copied for mxid-freezing and wasn't. We need an analogue of age() for mxids. Perhaps just mxid_age()? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buildfarm "master-next" branch? (was: Dynamic Shared Memory stuff)
On Wed, Apr 16, 2014 at 8:24 PM, Craig Ringer wrote: > On 04/17/2014 12:08 AM, Robert Haas wrote: >> On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila >> wrote: >>> On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas wrote: On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila wrote: > On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas > wrote: >> For the create case, I'm wondering if we should put the block that >> tests for !hmap *before* the _dosmaperr() and check for EEXIST. What >> is your opinion? > > Either way is okay, but I think the way you are suggesting is better as it > will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? >>> >>> Please find attached patch to address this issue. >>> One minor point to note is that now we have to call GetLastError() twice, >>> once inside error path and once to check EEXIST, but I think that is okay >>> as existing code in PGSharedMemoryCreate() does it that way. >> >> OK. I committed this blindly, but I don't have a Windows dev >> environment, so please keep an eye on the Windows buildfarm members >> and provide follow-on patches if any of them get unhappy about this. > > Given that we're doing this a fair bit, is it reasonable to define a > "master-next" branch in git and have the buildfarm (or at least the > Windows members) build that? > > Permit master-next to be rebased and reset. > > That way it's possible to fire stuff off and see what happens on the > buildfarm without introducing broken commits unnecessarily. > > Thoughts? In this particular case, I have a lot of confidence that Amit tested this on his own machine before sending in the patch; and moreover, he wrote that code in the first place. So it's no worse than it would have been if that change had been in the originally committed version, which I didn't personally test on Windows, either, but which has nevertheless mostly passed buildfarm testing. Arguably, if I'm going to be hacking on platform-dependent things very often, I should get my own Windows build environment set up so that I can test it myself, but it hasn't quite been worth it to me thus far, and Amit has proven to be pretty reliable in terms of getting things right. In terms of improving the buildfarm infrastructure, the thing I would most like to have is more frequent runs. It would be great if pushing a commit to the master repository triggered an immediate build on every buildfarm animal so that you could see all of the failures in a short period of time. But that would require more resources for the buildfarm machines, which are provided on a strictly volunteer basis, so it's hard to see how to arrange that. But the ability to easily spin up temporary branches for testing would also be great. Unfortunately, I suspect that only a minority of the buildfarm owners would choose to participate, which would make it less useful, but if we could solve that problem I'd be all in favor of it. I'm not volunteering to do the work, though. Honestly, I don't think we have a huge problem here today. Yeah, the buildfarm turns pretty colors on a fairly regular basis, but those issues are also generally fixed very quickly. With the unfortunate exception of the seemingly never-ending stream multixact-related bugs, a user who took a snapshot of our master branch at a randomly selected point during the 9.4 development cycle would likely have gotten code reliable enough to be run in production. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Wed, Apr 16, 2014 at 12:53 AM, Rod Taylor wrote: > 4) Plays queries from the CSV logs starting from $TIME mimicking actual > timing and transaction boundaries This ^^ But I recall a number of previous attempts including plugins for general load testing systems, what happened to them? Honestly if you really want to load test properly though what you really want to do is deploy a copy of your entire application and feed it requests simulating user traffic. That results in more accurate representation and gives you data that's easier to act on. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff -> if
2014-04-17 Michael Paquier : > Is there no equivalent in German? For example in French there is "ssi". gdw (genau dann, wenn) Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
On 04/17/2014 01:35 AM, Tom Lane wrote: I'll go change it. Thanks for fixing this. The new name "Execution time" is much clearer. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote: > On Thu, Feb 6, 2014 at 09:40:32AM +0100, Andres Freund wrote: > > On 2014-02-05 12:36:42 -0500, Robert Haas wrote: > > > >> It may well be that your proposal is spot on. But I'd like to see some > > > >> data-structure-by-data-structure measurements, rather than assuming > > > >> that > > > >> alignment must be a good thing. > > > > > > > > I am fine with just aligning BufferDescriptors properly. That has > > > > clearly shown massive improvements. > > > > > > I thought your previous idea of increasing BUFFERALIGN to 64 bytes had > > > a lot to recommend it. > > > > Good. > > > > I wonder if we shouldn't move that bit of logic: > > if (size >= BUFSIZ) > > newStart = BUFFERALIGN(newStart); > > out of ShmemAlloc() and instead have a ShmemAllocAligned() and > > ShmemInitStructAligned() that does it. So we can sensibly can control it > > per struct. > > > > > But that doesn't mean it doesn't need testing. > > > > I feel the need here, to say that I never said it doesn't need testing > > and never thought it didn't... > > Where are we on this? It needs somebody with time to evaluate possible performance regressions - I personally won't have time to look into this in detail before pgcon. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] slow startup due to LWLockAssign() spinlock
On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote: > On Tue, Feb 4, 2014 at 12:58:49AM +0100, Andres Freund wrote: > > On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > > > Andres Freund writes: > > > > On larger, multi-socket, machines, startup takes a fair bit of time. As > > > > I was profiling anyway I looked into it and noticed that just about all > > > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > > > > with shared_buffers=48GB on the server Nate Boley provided, takes about > > > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > > > > Simply modifying LWLockAssign() to not take the spinlock when > > > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > > > > LWLockAssign() prettier it seems enough of a speedup to be worthwile > > > > nonetheless. > > > > > > Hm. This patch only works if the postmaster itself never assigns any > > > LWLocks except during startup. That's *probably* all right, but it > > > seems a bit scary. Is there any cheap way to make the logic actually > > > be what your comment claims, namely "Interlocking is not necessary during > > > postmaster startup"? I guess we could invent a ShmemInitInProgress global > > > flag ... > > > > So, here's a flag implementing things with that flag. I kept your name, > > as it's more in line with ipci.c's naming, but it looks kinda odd > > besides proc_exit_inprogress. > > Uh, where are we on this? I guess it's waiting for the next CF :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New windows compiler warning from 585bca39
I've attached a tiny patch that fixes a new compiler warning on the windows build... Perhaps the #ifndef could be placed in a nicer spot in the patch, but the attached should at least describe where the problem lies... (ClCompile target) -> src\interfaces\libpq\fe-connect.c(3365): warning C4101: 'msgid' : unreferenced local variable [D:\Postgres\b\libpq.vcxproj] 1 Warning(s) fe-connect_win32_warning_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
(2014/04/16 22:16), Hannu Krosing wrote: On 04/16/2014 01:35 PM, Etsuro Fujita wrote: Maybe I'm missing something, but I think that you can do what I think you'd like to do by the following procedure: No, what I'd like PostgreSQL to do is to 1. select the id+set from local table 2. select the rows from remote table with WHERE ID IN () 3. then join the original set to selected set, with any suitable join strategy The things I do not want are A. selecting all rows from remote table (this is what your examples below do) or B. selecting rows from remote table by single selects using "ID = $" (this is something that I managed to do by some tweaking of costs) as A will be always slow if there are millions of rows in remote table and B is slow(ish) when the idset is over a few hundred ids I hope this is a bit better explanation than I provided before . Ah, I understand what you'd like to do. Thank you for the explanation. P.S. I am not sure if this is a limitation of postgres_fdw or postgres itself If I understand correctly, neither the current postgres_fdw planning function nor the current postgres planner itself support such a plan. For that I think we would probably need to implement a distributed query processing technique such as semijoin or bloomjoin in those modules. Thanks, P.S. or, that as Tom mentioned, by disabling the use_remote_estimate function: I misunderstood the meaning of what Tom pointed out. Sorry for that. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Include planning time in EXPLAIN ANALYZE output.
I found a bit confusing, when planning time is greater total time, so +1 for execution time. On Thu, Apr 17, 2014 at 3:35 AM, Tom Lane wrote: > Bruce Momjian writes: >> Where are we on this? I still see: > >> test=> EXPLAIN ANALYZE SELECT 1; >>QUERY PLAN >> >> >>Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 >> rows=1 loops=1) >>Planning time: 0.009 ms >> -->Total runtime: 0.009 ms >> (3 rows) > > There seemed to be a clear majority of votes in favor of changing it to > say "Execution time". Robert was arguing for no change, but I don't think > that's tenable in view of the fact that the addition of the "Planning > time" line is already a change, and one that makes the old wording > confusing. > > I'll go change it. > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers