Re: [HACKERS] Minor improvements in alter_table.sgml
(2014/04/14 23:53), Robert Haas wrote: On Fri, Apr 11, 2014 at 5:00 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: Attached is an updated version of the patch. I applied the first two hunks of this, which seem like clear oversights; and also the bit fixing the constraint_name language. I think the other changes deserve to be considered separately, and in particular I'm still not sure it's a good idea to document both OF type_name and type_name. OK, Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous Transaction (WIP)
On 14 April 2014 20:10, Simon Riggs wrote: Autonomous Transaction Storage: As for main transaction, structure PGXACT is used to store main transactions, which are created in shared memory of size: (Number of process)*sizeof(struct PGXACT) Similarly a new structure will be defined to store autonomous transaction: Struct PGAutonomousXACT Oh...I had already added this patch for 2014-June CommitFest, thinking that everyone is busy with work to wrap up 9.4. I already proposed exactly this design two years ago and it was rejected at the PgCon hackers meeting. I have a better design worked out now and will likely be working on it for 9.5 Can we work together to take this feature to final goal. May be you can go through my complete patch and see whatever part of the patch and related design can be re-used along with your new design. Also if possible you can share your design (even rough is OK), I will see if I can contribute to that in some-way. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] Archive recovery won't be completed on some situation.
Hello, thank you for the discussion. At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote I don't find that very radical at all. The backup_label file is *supposed* to be removed on the master if it crashes during the backup; and it should never be removed from the backup itself. At least that's how I understand it. Unfortunately, people too often The code indeed seems to assume that, and I couldn't think of any measure to avoid that dead-end once recovery sequence reads backup label accidentially left behind. I thought up to remove backup label during immediate shutdown on prvious versions, like 9.4 does. CancelBackup does only stat-unlink-rename sequence so I think this doesn't obstruct immediate shutdown sequence. And this doesn't change any seeming behavior or interfaces just except for this case. What do you think about this? Isn't this also applicable for older versions? postmaster.c@9.3.3:2339 pmdie(SIGNAL_ARGS) { ... switch (postgres_signal_arg) { ... case SIGQUIT: ... SignalUnconnectedWorkers(SIGQUIT); + +/* + * Terminate exclusive backup mode. This is done in + * PostmasterStateMachine() for other shutdown modes. + */ +if (ReachedNormalRunning) +CancelBackup(); ExitPostmaster(0); break; Aside from this, I'll post the new option for pg_resetxlog for the next CF. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Thank you for committing. At Fri, 28 Mar 2014 11:50:56 -0400, Tom Lane t...@sss.pgh.pa.us wrote in 21426.1396021...@sss.pgh.pa.us tgl Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: tgl Hello. Attached is the 2nd version of 'pushdown in UNION ALL on tgl partitioned tables' patch type 1 - fix in equiv-member version. tgl tgl Committed, thanks. tgl tglregards, tom lane regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 04/15/2014 03:39 AM, Tom Lane wrote: I still wish we could get rid of this problem by fixing the Windows build recipes so that the PGDLLEXPORT marking wasn't needed. We proved to ourselves recently that getting rid of PGDLLIMPORT on global variables wouldn't work, but I'm not sure that the function end of it was really investigated. My understanding is that we *can* drop PGDLLEXPORT on functions without actively breaking anything. But we probably shouldn't. If we omit PGDLLEXPORT, the linker of the DLL/executable that imports the extern function will generate a thunk from the .LIB file for the target DLL during linkage; this thunk within the DLL/EXE with the undefined extern then jumps to the real address within the defining DLL/EXE. Reference: http://msdn.microsoft.com/en-us/library/zw3za17w.aspx So in other words, it makes calls across DLL boundaries less efficient by adding a layer of indirection. (No idea how this works in the presence of link time base address randomization either). I actually think we should *add* a LIBPQEXPORT that handles this for libpq, much like PGDLLEXPORT does for postgres(.exe). And in the process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or something. PGDLLEXPORT is probably less important overall - it'll mainly impact extensions (like hstore, intarray, etc) that call into the server. I wonder if this thunking still really mattres with modern CPU architecures' smart branch prediction, TLB caches, etc. I haven't found much info on the real world impact. It would probably be reasonable to add PGDLLEXPORT within postgres.exe only on functions we've intentionally exposed for use by extensions, where those functions are likely to get called a lot and don't have bigget costs like disk I/O, network I/O, expensive memory allocations, etc, that make call time overheads irrelevant. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
Craig Ringer cr...@2ndquadrant.com writes: On 04/15/2014 03:39 AM, Tom Lane wrote: I still wish we could get rid of this problem by fixing the Windows build recipes so that the PGDLLEXPORT marking wasn't needed. We proved to ourselves recently that getting rid of PGDLLIMPORT on global variables wouldn't work, but I'm not sure that the function end of it was really investigated. My understanding is that we *can* drop PGDLLEXPORT on functions without actively breaking anything. But we probably shouldn't. If we omit PGDLLEXPORT, the linker of the DLL/executable that imports the extern function will generate a thunk from the .LIB file for the target DLL during linkage; this thunk within the DLL/EXE with the undefined extern then jumps to the real address within the defining DLL/EXE. TBH, if the only argument for this is a small efficiency difference, then to my mind it barely requires discussion. I don't give one hoot about micro-optimization for the Windows platform; I'm satisfied if it works at all there. And I seriously doubt that a couple more cycles to call any function implemented in a loadable module would matter anyway. I actually think we should *add* a LIBPQEXPORT that handles this for libpq, much like PGDLLEXPORT does for postgres(.exe). And in the process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or something. My reaction to that is not bloody likely. I remarked on this upthread already, but there is absolutely no way that I want to clutter our source code with platform-specific markings like that. Perhaps somebody could try a Windows build with PGDLLEXPORT defined to empty, and verify that it works, and if so do a pgbench comparison against a build done the existing way? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: [ assorted comments about custom-scan patch, but particularly ] * The prune hook makes me feel very uneasy. It seems weirdly specific implementation detail, made stranger by the otherwise lack of data maintenance API calls. Calling that for every dirty page sounds like an issue and my patch rejection indicator is flashing red around that. Yeah. After a fast review of the custom-scan and cache-scan patches, it seems to me that my original fears are largely confirmed: the custom scan patch is not going to be sufficient to allow development of any truly new plan type. Yeah, you can plug in some new execution node types, but actually doing anything interesting is going to require patching other parts of the system. Are we going to say to all comers, sure, we'll put a hook call anywhere you like, just ask? I can't see this as being the way to go. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. When hooks are proposed here, we evaluate on them on their merits and attempt to judge the likelihood that a hook in a particular place will be useful, but generally we're not averse to adding them, and as long as the paths aren't too performance-critical, I don't think we should be averse to adding them. We have a great system today for letting people add new data types and things of that sort, but anything that penetrates more deeply into the heart of the system pretty much can't be done; this is why various companies, such as our respective employers, have developed and maintained forks of the PostgreSQL code base instead of just hooking in to the existing code. We probably can't solve that problem completely, but that doesn't mean we should throw in the towel. And in particular, I think it's pretty normal that a new facility like custom scans might create additional demand for new hooks. If something was completely impossible before, and the new facility makes it almost-possible, then why shouldn't someone ask for a hook there? A prune hook probably has no business in the custom scan patch proper, but whether it's a good idea or a bad one should be decided on the merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Robert Haas robertmh...@gmail.com writes: On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. After a fast review of the custom-scan and cache-scan patches, it seems to me that my original fears are largely confirmed: the custom scan patch is not going to be sufficient to allow development of any truly new plan type. Yeah, you can plug in some new execution node types, but actually doing anything interesting is going to require patching other parts of the system. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. No, that's not what I said. ISTM that the argument for the custom-scan API is that it allows interesting new things to be done *without further modifying the core code*. But the example application (cache_scan) fails to demonstrate that, and indeed seems to be a counterexample. Whether we'd accept cache_scan on its own merits is a separate question. The problem for me is that custom-scan isn't showing that it can support what was claimed without doing serious damage to modularity and maintainability of the core code. What this may mean is that we need more attention to refactoring of the core code. But just removing static from any function that looks like it might be handy isn't my idea of well-considered refactoring. More the opposite in fact: if those things turn into APIs that we have to support, it's going to kill any ability to do such refactoring. A concrete example here is setrefs.c, whose responsibilities tend to change from release to release. I think if we committed custom-scan as is, we'd have great difficulty changing setrefs.c's transformations ever again, at least if we hoped to not break users of the custom-scan API. I'm not sure what the solution is --- but turning setrefs into a white box instead of a black box isn't it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
* Tom Lane (t...@sss.pgh.pa.us) wrote: A concrete example here is setrefs.c, whose responsibilities tend to change from release to release. I think if we committed custom-scan as is, we'd have great difficulty changing setrefs.c's transformations ever again, at least if we hoped to not break users of the custom-scan API. I'm not sure what the solution is --- but turning setrefs into a white box instead of a black box isn't it. Yeah, this was my (general) complaint as well and the answer that I kept getting back is well, it's ok, you can still break it between major releases and the custom scan users will just have to deal with it. I'm a bit on the fence about that, itself, but the other half of that coin is that we could end up with parts of the *core* code that think it's ok to go pulling in these functions, once they're exposed, and that could end up making things quite ugly and difficult to maintain going forward. Thanks, Stephen signature.asc Description: Digital signature
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. After a fast review of the custom-scan and cache-scan patches, it seems to me that my original fears are largely confirmed: the custom scan patch is not going to be sufficient to allow development of any truly new plan type. Yeah, you can plug in some new execution node types, but actually doing anything interesting is going to require patching other parts of the system. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. No, that's not what I said. ISTM that the argument for the custom-scan API is that it allows interesting new things to be done *without further modifying the core code*. But the example application (cache_scan) fails to demonstrate that, and indeed seems to be a counterexample. Whether we'd accept cache_scan on its own merits is a separate question. The problem for me is that custom-scan isn't showing that it can support what was claimed without doing serious damage to modularity and maintainability of the core code. I think there's two separate things in there, one of which I agree with and one of which I disagree with. I agree that we must avoid damaging the modularity and maintainability of the core code; I don't agree that custom-scan needs to be able to do interesting things with zero additional changes to the core code. If we come up with three interesting applications for custom scan that require 5 new hooks between them, I'll consider that a major success - assuming those hooks don't unduly limit future changes we may wish to make in the core code. I think your concern about exposing APIs that may not be terribly stable is well-founded, but I don't think that means we shouldn't expose *anything*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Hi, On 2014-04-15 11:07:11 -0400, Robert Haas wrote: On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: [ discussion ] What I think this discussion shows that this patch isn't ready for 9.4. The first iteration of the patch came in 2013-11-06. Imo that's pretty damn late for a relatively complex patch. And obviously we don't have agreement on the course forward. I don't think we need to stop discussing, but I think it's pretty clear that this isn't 9.4 material. And that it's far from Ready for Committer. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Andres Freund and...@2ndquadrant.com writes: What I think this discussion shows that this patch isn't ready for 9.4. The first iteration of the patch came in 2013-11-06. Imo that's pretty damn late for a relatively complex patch. And obviously we don't have agreement on the course forward. I don't think we need to stop discussing, but I think it's pretty clear that this isn't 9.4 material. And that it's far from Ready for Committer. Yeah. I'm still not exactly convinced that custom-scan will ever allow independent development of new plan types (which, with all due respect to Robert, is what it was being sold as last year in Ottawa). But I'm not opposed in principle to committing it, if we can find a way to have a cleaner API for things like setrefs.c. It seems like late-stage planner processing in general is an issue for this patch (createplan.c and subselect.c are also looking messy). EXPLAIN isn't too great either. I'm not sure exactly what to do about those cases, but I wonder whether things would get better if we had the equivalent of expression_tree_walker/mutator capability for plan nodes. The state of affairs in setrefs and subselect, at least, is a bit reminiscent of the bad old days when we had lots of different bespoke code for traversing expression trees. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 7:45 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian br...@momjian.us wrote: I am glad you are looking at this. You are right that it requires a huge amount of testing, but clearly our code needs improvement in this area. Thanks. Does anyone recall the original justification for the recommendation that shared_buffers never exceed 8GiB? I'd like to revisit the test case, if such a thing exists. There are many reports of improvement from lowering shared_buffers. The problem is that it tends to show up on complex production workloads and that there is no clear evidence pointing to problems with the clock sweep; it could be higher up in the partition locks or something else entirely (like the O/S). pgbench is also not the greatest tool for sniffing out these cases: it's too random and for large database optimization is generally an exercise in de-randomizing i/o patterns. We really, really need a broader testing suite that covers more usage patterns. I was suspicious for a while that spinlock contention inside the clocksweep was causing stalls and posted a couple of different patches to try and reduce the chance of that. I basically gave up when I couldn't demonstrate that case in simulated testing. I still think there is no good reason for the clock to pedantically adjust usage count on contented buffers...better to throw a single TTAS and bail to the next buffer if either 'T' signals a lock. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \d+ and oid display
On Thu, Apr 10, 2014 at 08:05:11PM -0400, Bruce Momjian wrote: On Thu, Apr 10, 2014 at 07:58:55PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: It also has changed the OID status to only display if it exists. One question that came up with Robert is whether OID status should appear for \d as well, now that is only shows up when present. Yeah, I was wondering about that too. If part of the argument here is to make these two displays act more alike, it seems inconsistent that one is emitted by \d while the other only comes out with \d+. Of course, there are two ways to fix that: maybe the replica info also only belongs in \d+? OK, I changed my patch to only show replica info for \d+. If we decide to change them to both display for \d, I will update it again. OK, hearing only quiet, I have applied the patch. I like that we now document the replication identity default, which should go a long way to making this clearer. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich ch...@chrullrich.net wrote: I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set, the postmaster etc. would ignore the events. Why not just pass a command-line switch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure mmonc...@gmail.com wrote: There are many reports of improvement from lowering shared_buffers. The problem is that it tends to show up on complex production workloads and that there is no clear evidence pointing to problems with the clock sweep; it could be higher up in the partition locks or something else entirely (like the O/S). pgbench is also not the greatest tool for sniffing out these cases: it's too random and for large database optimization is generally an exercise in de-randomizing i/o patterns. We really, really need a broader testing suite that covers more usage patterns. I find it quite dissatisfying that we know so little about this. I'm finding that my patch helps much less when shared_buffers is sized large enough to fit the index entirely (although there are still some localized stalls on master, where there are none with patched). shared_buffers is still far too small to fit the entire heap. With shared_buffers=24GB (which still leaves just under 8GB of memory for the OS to use as cache, since this system has 32GB of main memory), the numbers are much less impressive relative to master with the same configuration. Both sets of numbers are still better than what you've already seen with shared_buffers=8GB, since of course the no more than 8GB recommendation is not an absolute, and as you say its efficacy seemingly cannot be demonstrated with pgbench. My guess is that the patch doesn't help because once there is more than enough room to cache the entire index (slightly over twice as many buffers as would be required to do so), even on master it becomes virtually impossible to evict those relatively popular index pages, since they still have an early advantage. It doesn't matter that master's clock sweep has what I've called an excessively short-term perspective, because there is always enough pressure relative to the number of leaf pages being pinned to prefer to evict heap pages. There is still a lot of buffers that can fit some moderate proportion of all heap pages even after buffering the entire index (something like ~13GB). You might say that with this new shared_buffers setting, clock sweep doesn't need to have a good memory, because it can immediately observe the usefulness of B-Tree leaf pages. There is no need to limit myself to speculation here, of course. I'll check it out using pg_buffercache. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Robert Haas On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich ch...@chrullrich.net wrote: I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set, the postmaster etc. would ignore the events. Why not just pass a command-line switch? Because, as I wrote in the message you are quoting, I did not think that having a command-line option for the sole purpose of telling the postmaster who its parent is was a suitable solution. I had already given up on that idea based on Amit's advice, and I will create a patch based on a command-line option. While I have you here, though, any suggestions on what the name of that option should be? I think --background is about right. Also, how should I treat the option on non-Windows platforms? Should it just not be there (= error), or be ignored if present? -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Bruce Momjian On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote: The problem can be solved this way, but the only question here is whether it is acceptable for users to have a new console window for server. Can others also please share their opinion if this fix (start server in new console) seems acceptable or shall we try by passing some information from pg_ctl and then ignore CTRL+C CTRL+BREAK for it? I wanted to point out that I think this patch is only appropriate for head, not backpatching. Also, on Unix we have to handle signals that Yes, of course. come from the kill command. Can you send CTRL+C from other applications on Windows? Yes again, using GenerateConsoleCtrlEvent() you can send these events to any (console-attached) process you have the required permissions for, but that is not an issue for the same reason it isn't one on Unix. All the target process sees is the event, it cannot determine (nor does it care) where the event came from. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
* From: Amit Kapila On Mon, Apr 14, 2014 at 11:46 AM, Christian Ullrich ch...@chrullrich.net wrote: * From: Amit Kapila Do you mean to say use some existing environment variable? Introducing an environment variable to solve this issue or infact using some existing environ variable doesn't seem to be the best way to pass such information. I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set, the postmaster etc. would ignore the events. Do you plan to reset it and if yes when? I think there is chance that after setting this environment variable, some other instance of server (postmaster) can read it and missed the signal which it should have otherwise processed. We have decided not to go this way, but just for completeness: Environment inheritance works the same way on Windows as on Unix. When a process is started with a modified environment (one of the plentiful arguments of CreateProcess() et al.), only that process and its descendants see the modification. I had not planned to set a system-level or user-level variable. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
On 2014-03-28 21:36:11 +0100, Andres Freund wrote: Hi, On 2014-03-27 08:02:35 -0400, Tom Lane wrote: Buildfarm member prairiedog thinks there's something unreliable about commit f01d1ae3a104019d6d68aeff85c4816a275130b3: *** /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out Thu Mar 27 04:12:40 2014 --- /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out Thu Mar 27 04:52:02 2014 *** *** 2333,2339 ) mapped; incorrectly_mapped | have_mappings +--- ! 0 | t (1 row) That's rather odd. It has survived for a couple of months on the other buildfarm animals now... Could one of you apply the attached patch adding more details to eventual failures? So I had made a notice to recheck on this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD indicates there haven't been any further failures... So, for now I assume this was caused by some problem fixed elsewhere. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure
Andres Freund and...@2ndquadrant.com writes: On 2014-03-27 08:02:35 -0400, Tom Lane wrote: Buildfarm member prairiedog thinks there's something unreliable about commit f01d1ae3a104019d6d68aeff85c4816a275130b3: So I had made a notice to recheck on this. http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD indicates there haven't been any further failures... So, for now I assume this was caused by some problem fixed elsewhere. Hard to say. In any case, I agree we can't make any progress unless we see it again. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: psql: conditionally display oids and replication identity Buildfarm isn't terribly pleased with this --- looks like you missed contrib/test_decoding/ Fixed. I added a personal script option that allows me to test contrib, but forgot to run it. Is that script of general utility for committers? If so, it might be good to include it in the distribution. I'd be happy to go through and perl-ify it, document it, etc. Or maybe it could be a new make target... Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
On 2014-04-15 12:32:36 -0700, David Fetter wrote: On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: psql: conditionally display oids and replication identity Buildfarm isn't terribly pleased with this --- looks like you missed contrib/test_decoding/ Fixed. I added a personal script option that allows me to test contrib, but forgot to run it. Is that script of general utility for committers? If so, it might be good to include it in the distribution. I'd be happy to go through and perl-ify it, document it, etc. Or maybe it could be a new make target... make check-world Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
David Fetter da...@fetter.org writes: On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: Fixed. I added a personal script option that allows me to test contrib, but forgot to run it. Is that script of general utility for committers? If so, it might be good to include it in the distribution. I'd be happy to go through and perl-ify it, document it, etc. Or maybe it could be a new make target... I'm pretty sure make check-world would've covered this already. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/14/2014 04:34 PM, Joe Conway wrote: On 04/14/2014 04:25 PM, Andres Freund wrote: On 2014-04-14 16:22:48 -0700, Joe Conway wrote: That'll help performance, but lets say I generally keep WAL files for PITR and don't turn that off before starting -- shouldn't I be very surprised to need over 3TB of archive storage when loading a 50GB table with a couple of indexes? The point is that more frequent checkpoints will increase the WAL volume *significantly* because more full page writes will have to be generated. OK, I'll see how much it can be brought down through checkpoint tuning and report back. One more question before I get to that. I had applied the following patch to XLogInsert 8-- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2f71590..e39cd37 100644 - --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) uint32 len, write_len; unsignedi; + unsignedinorm; boolupdrqst; booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID info == XLOG_SWITCH); uint8 info_orig = info; + uint32 xl_tot_len; /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) @@ -924,8 +926,23 @@ begin:; * header. */ INIT_CRC32(rdata_crc); + i = 0; + inorm = 0; for (rdt = rdata; rdt != NULL; rdt = rdt-next) + { COMP_CRC32(rdata_crc, rdt-data, rdt-len); + + if (rdt_lastnormal == rdt) + { + inorm = i; + i = 0; + } + else + i++; + } + xl_tot_len = SizeOfXLogRecord + write_len; + if ((inorm + i) 4 || xl_tot_len 2000) + elog(LOG, XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d, inorm, i, xl_tot_len); START_CRIT_SECTION(); 8-- The idea was to record number of normal and backup blocks, and total size of the record. I have quite a few entries in the log from the test run which are like: 8-- 2014-04-11 08:42:06.904 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168 2014-04-11 09:03:12.790 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172 2014-04-11 10:16:57.949 PDT;LOG: XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150 8-- and 8-- 2014-04-11 11:17:08.313 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 2014-04-11 11:17:08.338 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020 2014-04-11 11:17:08.389 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 8-- In other words, based on my inserted logic, it appears that there are 5 and 6 backup blocks on a fairly regular basis. However in xlog.h it says: 8-- * If we backed up any disk blocks with the XLOG record, we use flag * bits in xl_info to signal it. We support backup of up to 4 disk * blocks per XLOG record. 8-- So is my logic to record number of backup blocks wrong, or is the comment wrong, or am I otherwise misunderstanding something? Thanks, Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTTZxJAAoJEDfy90M199hl15MQAKTcv9BoZTsXDleSu9JrU1ha UhHUnDALRmxWLgyPYsgtifxMQ3jLp5eLrkMHGnQbVD17619OgHckuOiEphc2bdQp MfZlv3jrEqxnmsh6qKhK1J23mHj0cohWXQ9EUoyjE6tlZueLPyMigaIV662KP1d2 pUXCh6IEJYMMaPfqhR5Mxi62s+HMkpAULhafWeEeAwcU1eYNijFWlyxJWlsv7D6X 9ZuDSmRtqnAP0g23GcbxNkL/I9Yv090Uxar7um2Rw5SEUV+Uv1kMY0GVCjHluE0k qZhSF1tE2jypThhSnv5xRHT3ZzdKoJtNmfLekjws7+dFZbSBLgNOj4EdV0H/wUgf NqO71kkeRhd44uMRzii0cr03LwBiwqC2apCYoZy7s0X3rl10hZfKgVEKkyhaZ4VJ QdfR1WdY/hC7mKW7NPnkycF+Es1ykEfuPnKHHsyJ3fHeFGxkKD3I6A8jGnNnS6VL ba+jx+t3qnrcKQAW8lqQ3rAij5Jkb97Ljibc7o6w8cgnGA4S0tqsE6jDrdDR1FO4 ns5uULTs4REU8clFwiKNZnQfINRUUfqY1mtlRneJMANeafm0j2CyIzvqLqB2mdOH YL9SS2lIngQlVSfgpu7EiSS7sJx8XGe3a3YFE9DoTBpq009scrscH40+kuN823wp yruufkzaBN6lyAjo3zoR =GQDN -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
Joe Conway m...@joeconway.com writes: In other words, based on my inserted logic, it appears that there are 5 and 6 backup blocks on a fairly regular basis. However in xlog.h it says: 8-- * If we backed up any disk blocks with the XLOG record, we use flag * bits in xl_info to signal it. We support backup of up to 4 disk * blocks per XLOG record. 8-- So is my logic to record number of backup blocks wrong, or is the comment wrong, or am I otherwise misunderstanding something? The comment is correct, so you did something wrong. From memory, there's a goto-label retry loop in that function; maybe you need to zero your counters after the retry label? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
On 04/15/2014 11:53 PM, Joe Conway wrote: One more question before I get to that. I had applied the following patch to XLogInsert 8-- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2f71590..e39cd37 100644 - --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata) uint32 len, write_len; unsignedi; + unsignedinorm; boolupdrqst; booldoPageWrites; boolisLogSwitch = (rmid == RM_XLOG_ID info == XLOG_SWITCH); uint8 info_orig = info; + uint32 xl_tot_len; /* cross-check on whether we should be here or not */ if (!XLogInsertAllowed()) @@ -924,8 +926,23 @@ begin:; * header. */ INIT_CRC32(rdata_crc); + i = 0; + inorm = 0; for (rdt = rdata; rdt != NULL; rdt = rdt-next) + { COMP_CRC32(rdata_crc, rdt-data, rdt-len); + + if (rdt_lastnormal == rdt) + { + inorm = i; + i = 0; + } + else + i++; + } + xl_tot_len = SizeOfXLogRecord + write_len; + if ((inorm + i) 4 || xl_tot_len 2000) + elog(LOG, XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d, inorm, i, xl_tot_len); START_CRIT_SECTION(); 8-- The idea was to record number of normal and backup blocks, and total size of the record. I have quite a few entries in the log from the test run which are like: 8-- 2014-04-11 08:42:06.904 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168 2014-04-11 09:03:12.790 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172 2014-04-11 10:16:57.949 PDT;LOG: XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150 8-- and 8-- 2014-04-11 11:17:08.313 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 2014-04-11 11:17:08.338 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020 2014-04-11 11:17:08.389 PDT;LOG: XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332 8-- In other words, based on my inserted logic, it appears that there are 5 and 6 backup blocks on a fairly regular basis. However in xlog.h it says: 8-- * If we backed up any disk blocks with the XLOG record, we use flag * bits in xl_info to signal it. We support backup of up to 4 disk * blocks per XLOG record. 8-- So is my logic to record number of backup blocks wrong, or is the comment wrong, or am I otherwise misunderstanding something? You're counting XLogRecData structs, not backup blocks. Each backup block typically consists of three XLogRecData structs, one to record a BkpBlock struct, one to record the data before the unused hole in the middle of the page, and one for after. Sometimes just two, if there is no hole to skip. See the loop just before the CRC calculation, that's where the XLogRecData entries for backup blocks are created. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Excessive WAL generation and related performance issue
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/15/2014 02:15 PM, Heikki Linnakangas wrote: You're counting XLogRecData structs, not backup blocks. Each backup block typically consists of three XLogRecData structs, one to record a BkpBlock struct, one to record the data before the unused hole in the middle of the page, and one for after. Sometimes just two, if there is no hole to skip. See the loop just before the CRC calculation, that's where the XLogRecData entries for backup blocks are created. - Heikki Ah, thanks for the explanation! Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTTaLxAAoJEDfy90M199hlwuQP/3Tuea1TUe+4L21ZdProNIIF fUtejFNUwqhkyWNcnePlubgSyTEIfHGEG9hatrB5/MWdzpiyEvXdDkvV1ODakEhJ CVKZbnQ4dmnrevypy2f2YdhlbB9du/DDFhFcPOZGbn+vLywwM9oMPS8tQmsol37e aITe2GnD5LpEcmCSqzz04OL+xAxKLe8fXaI9dDsTRWXb9qdj4pDHI706CeixwSFb sGsGcIHXmnWieMby9qfWc0WGpc38iMRRkE+LeaEULhsycFP/2x09irXdhbl5T1SH 4PItwX0/ZgLskklG2gaD4HpNe75+Emj1i22PHDYhXSoAzpykHUf+kZZwMUr0AbaF 5QVCer071jHaMacpaVC7/qwUt8zISx4/1wtJuQzfk5H3P2q4L+b/xPmod5/cqs9z /wFp+9kjMT4349sSMe1eDPTDoIZKgRh8Eiag5IfJtrOAjoK+FN+k8uWNikiyFDMu z/3l+6mbfrl7FAmfeXLFC9fqhhGOiGLHoZufB/4qFgEikj4S94Hx9Q0nHqkMsFvM Fcd3qcpLI06Xku7LmBPRvdZ8OVFGWAirH1jBlrdsvC9E5VoZxgByxg90EaTlwjAQ 1ZaGOsbQoXdOPOwe/rGx2ONGwgZp8uFwHSXzUY+CJucvfYQh2AD67AlEhS7Jb9NC ummpulzJ6arce0815KaT =Y+m4 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Need Multixact Freezing Docs
Hackers, We need documentation on how users should intelligently set the multixact freeze settings. I'm happy to write the actual text, but I definitely don't have any idea how to set these myself. Under what circumstances should they be different from freeze_max_age? How? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila amit.kapil...@gmail.com wrote: I have checked that other place in code also check handle to decide if API has failed. Refer function PGSharedMemoryIsInUse(). So I think fix to call GetLastError() after checking handle is okay. Attached patch fixes this issue. After patch, the server shows below log which is exactly what is expected from test_shm_mq In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred way to test for a NULL pointer. I notice that the !hmap style is used throughout this code, so I guess cleaning that up is a matter for a separate commit. I think in that case we might want to cleanup some other similar usage (PGSharedMemoryCreate) of !hmap. Ah. Well, in that case maybe we should just leave it alone, since it's been like that forever and nobody's cared until now. For the create case, I'm wondering if we should put the block that tests for !hmap *before* the _dosmaperr() and check for EEXIST. What is your opinion? Either way is okay, but I think the way you are suggesting is better as it will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] How can we make beta testing better?
Hackers, I think 9.3 has given us evidence that our users aren't giving new versions of PostgreSQL substantial beta testing, or if they are, they aren't sharing the results with us. How can we make beta testing better and more effective? How can we get more users to actually throw serious workloads at new versions and share the results? I've tried a couple of things over the last two years and they haven't worked all that well. Since we're about to go into another beta testing period, we need something new. Ideas? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Question about optimising (Postgres_)FDW
Hi I am playing around with postgres_fdw and found that the following code ... -- CREATE EXTENSION postgres_fdw; CREATE SERVER loop foreign data wrapper postgres_fdw OPTIONS (port '5432', dbname 'testdb'); CREATE USER MAPPING FOR PUBLIC SERVER loop; create table onemillion ( id serial primary key, inserted timestamp default clock_timestamp(), data text ); insert into onemillion(data) select random() from generate_series(1,100); CREATE FOREIGN TABLE onemillion_pgfdw ( id int, inserted timestamp, data text ) SERVER loop OPTIONS (table_name 'onemillion', use_remote_estimate 'true'); testdb=# explain analyse select * from onemillion_pgfdw where id in (select id from onemillion where data '0.9' limit 100); QUERY PLAN - Nested Loop (cost=122.49..10871.06 rows=50 width=44) (actual time=4.269..93.444 rows=100 loops=1) - HashAggregate (cost=22.06..23.06 rows=100 width=4) (actual time=1.110..1.263 rows=100 loops=1) - Limit (cost=0.00..20.81 rows=100 width=4) (actual time=0.038..1.026 rows=100 loops=1) - Seq Scan on onemillion (cost=0.00..20834.00 rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1) Filter: (data '0.9'::text) Rows Removed by Filter: 805 - Foreign Scan on onemillion_pgfdw (cost=100.43..108.47 rows=1 width=29) (actual time=0.772..0.773 rows=1 loops=100) Total runtime: 93.820 ms (8 rows) Time: 97.283 ms -- ... actually performs 100 distinct SELECT * FROM onemillion WHERE id = $1 calls on remote side. Is there a way to force it to prefer a plan where the results of (select id from onemillion where data '0.9' limit 100) are passed to FDW as a single IN ( = ANY(...)) query and are retrieved all at once ? If not, how hord would it be to add this feature ? -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes: [ pathkey_and_uniqueindx_v10_20130411.patch ] I thought some more about this patch, and realized that it's more or less morally equivalent to allowing references to ungrouped variables when the query has a GROUP BY clause listing all the columns of the primary key. In that case the parser is effectively pretending that the GROUP BY list contains additional implicit entries that are functionally dependent on the entries that are actually there. In this patch, what we want to do is recognize that trailing entries in an ORDER BY list are semantically no-ops and can be ignored because they are functionally dependent on earlier entries. Now, the reason that the parser restricts the functional dependency deduction to a primary key is that it wants to be able to identify a constraint OID that the query is dependent on to be semantically valid. In this case, we don't need such an OID, so just finding any old unique index on not-null columns is good enough. (If someone drops the index, the optimization might become incorrect, but that would force replanning anyway.) However, this way of thinking about it shows that the patch is missing possible optimizations. If we have ORDER BY a, b, c and (a,b) is the primary key, then including c in the ORDER BY list is semantically redundant, *whether or not we use an indexscan on the pkey index at all*. More: if we have ORDER BY a, b, c and the primary key is (b,a), we can still discard c from the sort requirement, even though the pkey index as such isn't helpful for producing the required order. So hacking up the pathkeys attributed to the indexscan is the wrong thing. Rather, what we should be looking to do is decide that c is a useless pathkey and remove it from the query_pathkeys, much as we'd do if we found c = constant in WHERE. That would allow optimization of other query plans besides scan-the-pkey-index plans. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
On 15 April 2014 23:19, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote: Hi, stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean if and only if (i.e. =) -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
On 04/15/2014 02:25 PM, Josh Berkus wrote: Hackers, We need documentation on how users should intelligently set the multixact freeze settings. I'm happy to write the actual text, but I definitely don't have any idea how to set these myself. Under what circumstances should they be different from freeze_max_age? How? Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 8:11 PM, Peter Geoghegan p...@heroku.com wrote: PostgreSQL implements a clock sweep algorithm, which gets us something approaching an LRU for the buffer manager in trade-off for less contention on core structures. Buffers have a usage_count/popularity that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK algorithm only has one bit for what approximates our usage_count (so it's either 0 or 1). I think that at its core CLOCK is an algorithm that has some very desirable properties that I am sure must be preserved. Actually, I think it's more accurate to say we use a variant of clock pro, a refinement of the original CLOCK. PostgreSQL replacement algorithm is more similar to Generalized CLOCK or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm that approximates LIRS[3]. LIRS is what MySQL implements[4] and CLOCK-Pro is implemented by NetBSD [5] and there has been some work on trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double the cache size metadata entries and detect pages that have been recently referenced. Basically they provide an adaptive tradeoff between LRU and LFU. In the past, various hackers have noted problems they've observed with this scheme. A common pathology is to see frantic searching for a victim buffer only to find all buffer usage_count values at 5. It may take multiple revolutions of the clock hand before a victim buffer is found, as usage_count is decremented for each and every buffer. Also, BufFreelistLock contention is considered a serious bottleneck [1], which is related. There's a paper on a non blocking GCLOCK algorithm, that does lock free clock sweep and buffer pinning[7]. If we decide to stay with GCLOCK it may be interesting, although I still believe that some variant of buffer nailing[8] is a better idea, my experience shows that most of the locking overhead is cache line bouncing ignoring the extreme cases where our naive spinlock implementation blows up. Let's leave aside inner/root pages though, because they're so dramatically useful when in a primary index on a tpb-b table that they'll always be cached by any non-terrible algorithm. It beggars belief that the still relatively dense (and consequently *popular*) B+Tree leaf pages get so little credit for being of such long-term utility (in the view of our clock sweep algorithm as implemented). The algorithm has what could be loosely described as an excessively short-term perspective. There is clearly a better balance to be had here. I don't think the answer is that we have the B-Tree code give its pages some special treatment that makes them harder to evict, although I will admit that I briefly entertained the idea. There has been some research that indicates that for TPC-A workloads giving index pages higher weights increases hitrates[1]. I think the hardest hurdle for any changes in this area will be showing that we don't have any nasty regressions. I think the best way to do that would be to study separately the performance overhead of the replacement algorithm and optimality of the replacement choices. If we capture a bunch of buffer reference traces by instrumenting PinBuffer, we can pretty accurately simulate the behavior of different algorithm and tuning choices with different shared buffer sizes. Obviously full scale tests are still needed due to interactions with OS, controller and disk caches and other miscellaneous influences. But even so, simulation would get us much better coverage of various workloads and at least some confidence that it's a good change overall. It will be very hard and time consuming to gather equivalent evidence with full scale tests. [1] http://www.csd.uoc.gr/~hy460/pdf/p35-nicola.pdf [2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf [3] http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/jiang02_LIRS.pdf [4] http://lists.mysql.com/commits/28601 [5] http://fxr.watson.org/fxr/source/uvm/uvm_pdpolicy_clockpro.c?v=NETBSD [6] http://lwn.net/Articles/147879/ [7] http://derby-nb.googlecode.com/svn-history/r41/trunk/derby-nb/ICDE10_conf_full_409.pdf [8] http://www.postgresql.org/message-id/ca+tgmozypeyhwauejvyy9a5andoulcf33wtnprfr9sycw30...@mail.gmail.com Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
Hannu Krosing ha...@2ndquadrant.com writes: Is there a way to force it to prefer a plan where the results of (select id from onemillion where data '0.9' limit 100) are passed to FDW as a single IN ( = ANY(...)) query and are retrieved all at once ? You could write the query like that: select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); Or at least you should be able to, except when I try it I get explain analyze select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); ERROR: operator does not exist: integer = integer[] HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[] so there's something the remote-estimate code is getting wrong here. (It seems to work without remote_estimate, though.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote: PostgreSQL replacement algorithm is more similar to Generalized CLOCK or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm that approximates LIRS[3]. LIRS is what MySQL implements[4] and CLOCK-Pro is implemented by NetBSD [5] and there has been some work on trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double the cache size metadata entries and detect pages that have been recently referenced. Basically they provide an adaptive tradeoff between LRU and LFU. That's good to know. There's a paper on a non blocking GCLOCK algorithm, that does lock free clock sweep and buffer pinning[7]. If we decide to stay with GCLOCK it may be interesting, although I still believe that some variant of buffer nailing[8] is a better idea, my experience shows that most of the locking overhead is cache line bouncing ignoring the extreme cases where our naive spinlock implementation blows up. You might be right about that, but lets handle one problem at a time. Who knows what the bottleneck will end up being if and when we address the naivety around frequency? I want to better characterize that problem first. There has been some research that indicates that for TPC-A workloads giving index pages higher weights increases hitrates[1]. Frankly, there doesn't need to be any research on this, because it's just common sense that probabilistically, leaf pages are much more useful than heap pages in servicing index scan queries if we assume a uniform distribution. If we don't assume that, then they're still more useful on average. I think the hardest hurdle for any changes in this area will be showing that we don't have any nasty regressions. I think the best way to do that would be to study separately the performance overhead of the replacement algorithm and optimality of the replacement choices. If we capture a bunch of buffer reference traces by instrumenting PinBuffer, we can pretty accurately simulate the behavior of different algorithm and tuning choices with different shared buffer sizes. Obviously full scale tests are still needed due to interactions with OS, controller and disk caches and other miscellaneous influences. But even so, simulation would get us much better coverage of various workloads and at least some confidence that it's a good change overall. It will be very hard and time consuming to gather equivalent evidence with full scale tests. I think I agree with all of that. The fact that we as a community don't appear to have too much to say about what workloads to prioritize somewhat frustrates this. The other problem is that sizing shared_buffers appropriately involves a surprising amount of deference to rules of thumb that in practice no one is quite prepared to rigorously defend - who is to say what apportionment of memory to Postgres is appropriate here? I too was hopeful that we could evaluate this work purely in terms of observed improvements to hit rate (at least initially), but now I doubt even that. It would be great to be able to say here are the parameters of this discussion, and have everyone immediately agree with that, but in this instance that's legitimately not possible. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
On 04/15/2014 06:26 PM, Thom Brown wrote: On 15 April 2014 23:19, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote: Hi, stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean if and only if (i.e. =) This is a reasonably common idiom, or used to be. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
On 04/15/2014 05:36 PM, Andrew Dunstan wrote: On 04/15/2014 06:26 PM, Thom Brown wrote: On 15 April 2014 23:19, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote: Hi, stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean if and only if (i.e. =) This is a reasonably common idiom, or used to be. If it has fallen into disuse the news has failed to reach me: http://en.wikipedia.org/wiki/If_and_only_if http://www.mathwords.com/i/if_and_only_if.htm http://mathworld.wolfram.com/Iff.html ... Cheers, Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
Steve Crawford scrawf...@pinpointresearch.com writes: On 04/15/2014 05:36 PM, Andrew Dunstan wrote: On 04/15/2014 06:26 PM, Thom Brown wrote: Yeah, apparently those are intentional, and mean if and only if This is a reasonably common idiom, or used to be. If it has fallen into disuse the news has failed to reach me: http://en.wikipedia.org/wiki/If_and_only_if http://www.mathwords.com/i/if_and_only_if.htm http://mathworld.wolfram.com/Iff.html The last discussion about this suggested that non-English speakers might be familiar with other abbreviations for the concept. But anyway, yes, those are not typos. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
On Tue, Apr 15, 2014 at 6:19 PM, Andreas 'ads' Scherbaum adsm...@wars-nicht.de wrote: Hi, stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. IFF is a common idiom in mathematics (and logic, which means it will be of interest in computer science, which shares boundaries with both) which stands for If And Only If. Unless it has very recently fallen out of fashion in mathematics, it's almost certainly intentional to use iff in the comments, and not merely a typo. -- When confronted by a difficult problem, solve it by reducing it to the question, How would the Lone Ranger handle this?
Re: [HACKERS] Patch: iff - if
Thom Brown-2 wrote On 15 April 2014 23:19, Andreas 'ads' Scherbaum lt; adsmail@ gt; wrote: Hi, stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. This only appears in comments, not in any code path. Yeah, apparently those are intentional, and mean if and only if (i.e. =) Just looking at the first few items someone's good intention is being ruined by bad execution...especially: Add...references to the section...a block is in [iff] chapters aren't autolabelled... Many of these are not mathematical propositions but flow-control logic for which IF is indeed the correct term-of-art; though re-reading the example above that is probably one of the more logical ones... Now: ...new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set [if/iff] a HOT update was done needs to be evaluated on its merits; namely does HEAP_ONLY_TUPLE ever get set if a HOT update was not done? If not then IFF is proper and necessary to convey that fact (though even this one is marginal and both versions are likely unambiguous in reality). In short returns BOOLEAN can reasonably be said to properly use IF in almost all cases - especially for those functions that are check-oriented (and thus obviously if the opposite condition is present the opposite result would be returned). IFF is not incorrect here but for me is misplaced and too-much-information. Given that it is unlikely the mis/over-use of IFF in the comments is a meaningful problem I would vote for letting the imprecision go and avoid the code churn. If the surrounding code was being altered anyway then having the corresponding comment updated seems acceptable; but a blanket find/replace doesn't seem helpful. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Patch-iff-if-tp5800153p5800172.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? Thank you! 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] The question about the type numeric
sure.postgres wrote Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? [1000 00][0][00111 0][001] I appreciate that you got no responses on the original e-mail but if you are going to re-raise the question at least have the courtesy to respond to your original thread and not go and start a new one. And maybe trying rephrasing the question since most likely your original question was not worded in such a way to garner a response. I may have this totally wrong but I don't see why the sign of your number should be anything but zero since that is, I presume, the value of the specific bit for a positive number - which yours is. So, in short, nothing seems to be wrong. If you think something is wrong you should probably state what that is explicitly and ask someone to explain what is happening. I would have said all this when I saw the first e-mail but I wasn't (and still am not) totally clear on what you are asking and was hoping someone more familiar could make better sense of it. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800173p5800174.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote: For the create case, I'm wondering if we should put the block that tests for !hmap *before* the _dosmaperr() and check for EEXIST. What is your opinion? Either way is okay, but I think the way you are suggesting is better as it will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? Please find attached patch to address this issue. One minor point to note is that now we have to call GetLastError() twice, once inside error path and once to check EEXIST, but I think that is okay as existing code in PGSharedMemoryCreate() does it that way. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_dsm_invalid_errcode_issue-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Tue, Apr 15, 2014 at 11:53 PM, Christian Ullrich ch...@chrullrich.net wrote: * From: Robert Haas Why not just pass a command-line switch? Because, as I wrote in the message you are quoting, I did not think that having a command-line option for the sole purpose of telling the postmaster who its parent is was a suitable solution. I had already given up on that idea based on Amit's advice, and I will create a patch based on a command-line option. While I have you here, though, any suggestions on what the name of that option should be? I think --background is about right. --background as switch name seems to be okay. Also, how should I treat the option on non-Windows platforms? Should it just not be there (= error), or be ignored if present? I think ignored for non-windows is better way to proceed. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
On 4/14/14, 10:30 PM, Andrew Dunstan wrote: On 04/14/2014 10:17 PM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: Add TAP tests for client programs I assume the buildfarm would need to be taught about this? Yes. It probably won't be a huge change, but it will need a bit of code. It might be more future-proof if the build farm just called make check-world and used some other way to identify the individual tests in that output. Otherwise, we'll need a new build farm release every time a test suite is added. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
Peter Eisentraut pete...@gmx.net writes: On 4/14/14, 10:30 PM, Andrew Dunstan wrote: Yes. It probably won't be a huge change, but it will need a bit of code. It might be more future-proof if the build farm just called make check-world and used some other way to identify the individual tests in that output. Otherwise, we'll need a new build farm release every time a test suite is added. That argument would be more convincing if make check-world worked on Windows ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup
On 4/9/14, 10:57 AM, Magnus Hagander wrote: So it'd be an array, and by default you'd have something like: basebackup_skip_path = $log_directory ? Maybe use it to skip backup labels by default as well. basebackup_skip_path = $log_directory, $backup_label_files I hadn't considered any details, but yes, someting along that line. And then you could also include arbitrary filenames or directories should you want. What are the use cases for excluding anything else? pg_basebackup ought to have some intelligence about what files are appropriate to include or exclude, depending on what the user is trying to do. It shouldn't become a general file copying tool. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The question about the type numeric
Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? 2014-04-15 wangshuo HighGo Software Co.,Ltd. Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan Shandong, China(Head Office) Tel:+86-0531-55701530 Fax:+86-0531-55701544 Website:www.highgo.com Mobile:18766416137
Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs
On 4/15/14, 11:11 PM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On 4/14/14, 10:30 PM, Andrew Dunstan wrote: Yes. It probably won't be a huge change, but it will need a bit of code. It might be more future-proof if the build farm just called make check-world and used some other way to identify the individual tests in that output. Otherwise, we'll need a new build farm release every time a test suite is added. That argument would be more convincing if make check-world worked on Windows ... What about it doesn't work on Windows? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 4/14/14, 3:28 PM, Peter Eisentraut wrote: On 4/4/14, 10:07 AM, Andres Freund wrote: If somebody previously tried to do the correct thing and attached PGDLLEXPORT to their own *function* prototoype, it would cause problems now. What is the difference (on affected platforms) between Datum funcname(PG_FUNCTION_ARGS); and writing (effectively) PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS); Datum funcname(PG_FUNCTION_ARGS); or for that matter Datum funcname(PG_FUNCTION_ARGS); PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS); If there isn't a difference, then my patch is fine. Otherwise, it might be good to document the issues for extension authors. Let me point out again that my patch doesn't actually do anything about PGDLLEXPORT or the like. It just adds automatic prototypes into PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and reduce some boilerplate in general. If it turns out that this might help someone optimize the Windows build, then great. But I gather it won't, so so what. Or maybe it can be made to work, in which case extension authors will get compiler errors about places they need to clean up. So it could still help that way, but who knows, that's not the point. If there are still concerns in this area, we could commit just the part that adds the prototype to PG_FUNCTION_INFO_V1 and let that sit for a while, and then remove the (now redundant) explicit prototypes in a later release, so if there is a need to revert it, it won't be so big. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?
Hi, Attached adds CREATE MATERIALIZED VIEW AS to the list of statements that can be EXPLAINed. -- Amit explain-create-materialized-view-as.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
Peter Eisentraut pete...@gmx.net writes: Let me point out again that my patch doesn't actually do anything about PGDLLEXPORT or the like. It just adds automatic prototypes into PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and reduce some boilerplate in general. Hmm ... for some reason I had gotten it in my head that you were adding PGDLLEXPORT to the autogenerated extern declarations, but at least the version of the patch in 1389762012.24046.2.ca...@vanquo.pezone.net doesn't do that, so the point is moot. I still object to the aspect of the patch that moves the externs for _PG_init/_PG_fini into fmgr.h: that is conceptually wrong and will create more confusion than the trivial code savings is worth. But I won't complain if you commit the PG_FUNCTION_INFO_V1 changes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
Andres Freund and...@2ndquadrant.com writes: What I think this discussion shows that this patch isn't ready for 9.4. The first iteration of the patch came in 2013-11-06. Imo that's pretty damn late for a relatively complex patch. And obviously we don't have agreement on the course forward. I don't think we need to stop discussing, but I think it's pretty clear that this isn't 9.4 material. And that it's far from Ready for Committer. Yep, today is the expected feature freeze date towards v9.4. It is little bit late to include v9.4 features, unfortunately. Yeah. I'm still not exactly convinced that custom-scan will ever allow independent development of new plan types (which, with all due respect to Robert, is what it was being sold as last year in Ottawa). But I'm not opposed in principle to committing it, if we can find a way to have a cleaner API for things like setrefs.c. It seems like late-stage planner processing in general is an issue for this patch (createplan.c and subselect.c are also looking messy). EXPLAIN isn't too great either. I'm not sure exactly what to do about those cases, but I wonder whether things would get better if we had the equivalent of expression_tree_walker/mutator capability for plan nodes. The state of affairs in setrefs and subselect, at least, is a bit reminiscent of the bad old days when we had lots of different bespoke code for traversing expression trees. Hmm. If we have something like expression_tree_walker/mutator for plan nodes, we can pass a walker/mutator function's pointer instead of exposing static functions that takes recursive jobs. If custom-plan provider (that has sub-plans) got a callback with walker/ mutator pointer, all it has to do for sub-plans are calling this new plan-tree walking support routine with supplied walker/mutator. It seems to me more simple design than what I did. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah. After a fast review of the custom-scan and cache-scan patches, it seems to me that my original fears are largely confirmed: the custom scan patch is not going to be sufficient to allow development of any truly new plan type. Yeah, you can plug in some new execution node types, but actually doing anything interesting is going to require patching other parts of the system. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. No, that's not what I said. ISTM that the argument for the custom-scan API is that it allows interesting new things to be done *without further modifying the core code*. But the example application (cache_scan) fails to demonstrate that, and indeed seems to be a counterexample. Whether we'd accept cache_scan on its own merits is a separate question. The problem for me is that custom-scan isn't showing that it can support what was claimed without doing serious damage to modularity and maintainability of the core code. I think there's two separate things in there, one of which I agree with and one of which I disagree with. I agree that we must avoid damaging the modularity and maintainability of the core code; I don't agree that custom-scan needs to be able to do interesting things with zero additional changes to the core code. If we come up with three interesting applications for custom scan that require 5 new hooks between them, I'll consider that a major success - assuming those hooks don't unduly limit future changes we may wish to make in the core code. I think your concern about exposing APIs that may not be terribly stable is well-founded, but I don't think that means we shouldn't expose *anything*. I agree 100%. We usually change hook definition release-by-release, and it is author's responsibility to follow the newer interface if he continues to maintain his extension on the newer release also. Probably, it is a gray stuff neither black nor white. If we can design a perfect interface, it might be good but has no evolution further. Of course, it does not justify poor designed interface, but an important stuff is to find out a best way at this moment. It may take core refactoring, not just exposing static functions. What I tried to implement is the only way to implement it. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei kai...@ak.jp.nec.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 01:25 AM, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: Is there a way to force it to prefer a plan where the results of (select id from onemillion where data '0.9' limit 100) are passed to FDW as a single IN ( = ANY(...)) query and are retrieved all at once ? You could write the query like that: select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); My actual use-case was about a join between a local and a remote table and without rewriting the query (they come from ORM) I was hoping to be able to nudge postgresql towards a better plan via some tuning of table/fdw options or GUCs. for example, would postgresql use the WHERE id IN (...) query on remote side for a query like select r.data, l.data from onemillion_pgfdw r join onemillion l on r.id = l.id and l.data '0.999'; if it recognizes that the local side returns only 1000 rows ? or would it still use 1000 individual WHERE id = $1 queries. Is getting the foreign data via IN and then turning the data into a hash for joining one of the plans it considers at all ? Best Hannu Or at least you should be able to, except when I try it I get explain analyze select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); ERROR: operator does not exist: integer = integer[] HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[] so there's something the remote-estimate code is getting wrong here. (It seems to work without remote_estimate, though.) regards, tom lane -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote: There's a paper on a non blocking GCLOCK algorithm, that does lock free clock sweep and buffer pinning[7]. If we decide to stay with GCLOCK it may be interesting, although I still believe that some variant of buffer nailing[8] is a better idea, my experience shows that most of the locking overhead is cache line bouncing ignoring the extreme cases where our naive spinlock implementation blows up. You might be right about that, but lets handle one problem at a time. Who knows what the bottleneck will end up being if and when we address the naivety around frequency? I want to better characterize that problem first. Just to summarize you about the previous discussion and the improvements that we decided to do in this area based on feedback are as follows: 1. Bgwriter needs to be improved so that it can help in reducing usage count and finding next victim buffer (run the clock sweep and add buffers to the free list). 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are less. 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer (a spinlock for the freelist, and an lwlock for the clock sweep). Here we can try to make it lock free based on atomic ops as well. 4. Bgwriter needs to be more aggressive, logic based on which it calculates how many buffers it needs to process needs to be improved. 5. Contention around buffer mapping locks. 6. Cacheline bouncing around the buffer header spinlocks, is there anything we can do to reduce this? 7. Choose Optimistically used buffer in StrategyGetBuffer(). 8. Don't bump the usage count every time buffer is pinned. I have already addressed some of these improvements in patch[1] and for other's, I have plan to work on them for 9.5. I think here you want to address the improvements related to usage count and see if it can get us win in some of commonly used scenario's, without affecting any other commonly used scenario. I feel this is good idea to pursue and see if we can get good benefits with it. Infact few days back, I had ran some tests manually to see the problems around BufFreeListLock (currently I don't have script ready) and more recently Jason Petersen has done some benchmarking in this area which you can refer it here[2]. I wonder if we can work together to improve things in this area. [1] http://www.postgresql.org/message-id/006e01ce926c$c7768680$56639380$@kap...@huawei.com [2] https://googledrive.com/host/0Bx33JCTmOADOeTIwaE9KX21yWEk/Concurrency%20Limits%20with%20Large%20Working%20Sets With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 06:12 AM, Hannu Krosing wrote: On 04/16/2014 01:25 AM, Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: Is there a way to force it to prefer a plan where the results of (select id from onemillion where data '0.9' limit 100) are passed to FDW as a single IN ( = ANY(...)) query and are retrieved all at once ? You could write the query like that: select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); My actual use-case was about a join between a local and a remote table and without rewriting the query (they come from ORM) I was hoping to be able to nudge postgresql towards a better plan via some tuning of table/fdw options or GUCs. for example, would postgresql use the WHERE id IN (...) query on remote side for a query like select r.data, l.data from onemillion_pgfdw r join onemillion l on r.id = l.id and l.data '0.999'; if it recognizes that the local side returns only 1000 rows ? or would it still use 1000 individual WHERE id = $1 queries. Is getting the foreign data via IN and then turning the data into a hash for joining one of the plans it considers at all ? It sees that could we need an extra tuning parameter for choosing the ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1 something between `fdw_startup_cost` and `fdw_tuple_cost` to signify that an IN query returning 1000 rows runs faster than 1000 = queries as I understan currently they both would be estimated as fdw_startup_cost + 1000 * fdw_tuple_cost the new parameter could be fdw_call_cost or fdw_query_cost and would estimate how much each individual call to fdw costs, thus favouring calls which return more data in one call Cheers Hannu Best Hannu Or at least you should be able to, except when I try it I get explain analyze select * from onemillion_pgfdw where id = any (array(select id from onemillion where data '0.9' limit 100)); ERROR: operator does not exist: integer = integer[] HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts. CONTEXT: Remote SQL command: EXPLAIN SELECT id, inserted, data FROM public.onemillion WHERE ((id = ANY ((SELECT null::integer[] so there's something the remote-estimate code is getting wrong here. (It seems to work without remote_estimate, though.) regards, tom lane -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan p...@heroku.com wrote: In the past, various hackers have noted problems they've observed with this scheme. A common pathology is to see frantic searching for a victim buffer only to find all buffer usage_count values at 5. It may take multiple revolutions of the clock hand before a victim buffer is found, as usage_count is decremented for each and every buffer. Also, BufFreelistLock contention is considered a serious bottleneck [1], which is related. I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into hot and cold lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's easy to be too simplistic, here: an algorithm that requires that no more than half the list be hot will fall over badly on a workload where the working set exceeds the available cache and the really hot portion of the working set is 60% of the available cache. So you need a more sophisticated algorithm than that. But that core property that not all buffers can be hot must somehow be preserved, and our algorithm doesn't. This isn't a fundamental property of the usage-count idea; it's an artifact of the fact that usage count decreases are tied to eviction pressure rather than access pressure. For example, suppose we made a rule that if the total usage counts of all buffers exceed 3 * NBuffers, then every time you bump the usage count of a buffer from N to N+1, you're required to advance the clock sweep far enough to decrease the reference count of a buffer by one. When you want to reclaiim a buffer, you advance a separate clock sweep until you find a buffer with a zero usage count; if you circle the whole ring without finding one, then you reclaim the buffer you saw with the lowest usage count. There are obvious scalability problems here (everyone fighting over the right to advance the clock sweep) but ignore that for the sake of the thought experiment: now you have an algorithm where not all buffers can be hot. If some buffers are hotter than others, then whenever their usage count is decreased it will immediately get pushed back up again, but some other buffer then has to absorb the decrease. Only the buffers that are really hot can maintain high usage counts, because *somebody* has to have a low usage count. Even ignoring scalability concerns, this might not be (and probably isn't) exactly what we want to implement, but I think it illustrates an important control principle all the same: buffer cooling needs to be driven by the same underlying phenomenon - probably buffer access - as buffer heating. If they're driven by unrelated phenomena, then the rates may be wildly incomparable, and you'll end up with everything hot or everything cold. If that happens, you lose, because with everything the same, there's no principled way to decide which things are actually best to evict. If we come up with some good solution for shared buffers, we should also consider it applying it to SLRU eviction. I believe that the current situation around CLOG eviction is none too pretty. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] How can we make beta testing better?
On Tue, Apr 15, 2014 at 5:47 PM, Josh Berkus j...@agliodbs.com wrote: Hackers, I think 9.3 has given us evidence that our users aren't giving new versions of PostgreSQL substantial beta testing, or if they are, they aren't sharing the results with us. How can we make beta testing better and more effective? How can we get more users to actually throw serious workloads at new versions and share the results? I've tried a couple of things over the last two years and they haven't worked all that well. Since we're about to go into another beta testing period, we need something new. Ideas? I think it boils down to making it really easy to create a workload generator. Most companies have simple single-threaded regression tests for functionality but very few companies have good parallel workload generators which reflect activities in their production environment. A documented beta test process/toolset which does the following would help: 1) Enables full query logging 2) Creates a replica of a production DB, record $TIME when it stops. 3) Allow user to make changes (upgrade to 9.4, change hardware, change kernel settings, ...) 4) Plays queries from the CSV logs starting from $TIME mimicking actual timing and transaction boundaries If Pg can make it easy to duplicate activities currently going on in production inside another environment, I would be pleased to fire a couple billion queries through it over the next few weeks. #4 should include reporting useful to the project, such as a sampling of queries which performed significantly worse and a few relative performance stats for overall execution time.
Re: [HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?
On Wed, Apr 16, 2014 at 12:35 PM, Amit Langote amitlangot...@gmail.com wrote: Hi, Attached adds CREATE MATERIALIZED VIEW AS to the list of statements that can be EXPLAINed. Now that you mention that, REFRESH MATERIALIZED VIEW can be EXPLAIN'ed as well, except that it returns that and does not error out: =# explain refresh materialized view aam; QUERY PLAN --- Utility statements have no plan structure (1 row) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers