Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1
On 04/15/2014 10:17 PM, Tom Lane wrote: I actually think we should *add* a LIBPQEXPORT that handles this for libpq, much like PGDLLEXPORT does for postgres(.exe). And in the process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or something. My reaction to that is not bloody likely. I remarked on this upthread already, but there is absolutely no way that I want to clutter our source code with platform-specific markings like that. Perhaps somebody could try a Windows build with PGDLLEXPORT defined to empty, and verify that it works, and if so do a pgbench comparison against a build done the existing way? Good idea. Personally, I don't care about Windows enough. I want it to work, but performance optimisation is beyond what I'm bothered with. Another useful test would be to modify libpq as described above, so its headers set __declspec(dllexport) on its exports during compilation and its headers set __declspec(dllimport) when included while compiling other binaries that will link to libpq. Then use *that* in pgbench too and see if it makes any meaningful difference. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] BGWorkers, shared memory pointers, and postmaster restart
Hi all I've been using the dynamic BGWorker support for some recent work, and I think I've found an issue with how postmaster restarts are handled. TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. If the postmaster restarts due to a crash, it calls shared memory startup hooks and sets the found pointer to false, so they're re-inited. This makes sense as shm is presumed to be corrupt. Dynamic BGWorkers are killed as a part of restart. However, they're not unregistered, and are relaunched if there's a bgw_restart_time set. They're called with the same bgw_main_arg Datum as with the original launch. If this Datum is a pointer into the shared memory that just got zeroed and re-inited, exciting things happen. In my case the worker restart generally happens after shm is re-inited, and since it gets reinitialized with the same contents the dynamic bgworker registered during postmaster startup restarts normally. I only noticed the problem because my shared memory init hook launches bgworkers once shm is set up, and I was getting two copies of a bgworker after postmaster restart. This also affects static bgworkers that use a pointer into shared memory to pass data on EXEC_BACKEND platforms (Windows). For dynamic bgworkers, it seems like it'd be OK to just require extensions to re-register them after a postmaster restart, or add a BGW_UNREGISTER_ON_POSTMASTER_RESTART flag to let that be controlled so bgworkers that didn't receive pointers into shm didn't have to deal with it. So any pointer into shared memory that's being re-initialized would be discarded when the bgworker was unregistered during postmater restart. Dynamic bgworkers are new in 9.4, so we still have the freedom to change behaviour for them. But that won't fix static bgworkers that have a pointer into shm as an argument. In non-EXEC_BACKEND platforms we can just pass a pointer to palloc'd postmaster memory, but that won't work if we have to exec() after fork(). I don't think it's reasonable to say well, don't do that - if you can only send a single pass-by-value Datum to a bgworker's main function, their utility is greatly reduced. I could always set up an exit hook / SIGQUIT handler that tries to unregister dynamic bgworkers using their BGWorkerHandle s, from the worker that initially registered them. If the worker that registered them is the one that crashed and triggered a postmaster restart this won't do any good, so it's a half-solution at best. I can't stash the BGWorkerHandle s for the launched workers in shm and unregister them during postmaster restart either. For one thing, shm is presumed corrupt. For another, BGWorkerHandle is an incomplete struct with no exposed size, so I can't copy it into shm anyway. This seems like a bit of a pickle. Am I missing something obvious? The only way around it that I can currently see is to have a single static bgworker with no pointer argument launch and manage all the other workers required for the extension. It would launch them all with bgw_restart_time = BGW_NEVER_RESTART and set its self as the bgw_notify_pid . If they die, it unregisters them then registers a new one. Effectively, it's doing the work the bgworker code does already, except that it makes sure the bgworkers don't get relaunched on postmaster restart. Since it doesn't depend on shm being valid, this should work, but it's messy and seems like there should be a better way. Do we need to change how we define BGWorkers to require that a BGWorker with a Datum that points to shared memory be automatically unregistered by the postmaster on restart? This would have to apply to static BGWorkers too, so we'd want to think about adding a flag for it and maybe even backpatching the flag into 9.3; it'd only affect extensions that actually used the combination described above. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 9:44 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan p...@heroku.com wrote: In the past, various hackers have noted problems they've observed with this scheme. A common pathology is to see frantic searching for a victim buffer only to find all buffer usage_count values at 5. It may take multiple revolutions of the clock hand before a victim buffer is found, as usage_count is decremented for each and every buffer. Also, BufFreelistLock contention is considered a serious bottleneck [1], which is related. I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into hot and cold lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's possible that I've misunderstood what you mean here, but do you really think it's likely that everything will be hot, in the event of using something like what I've sketched here? I think it's an important measure against this general problem that buffers really earn the right to be considered hot, so to speak. With my prototype, in order for a buffer to become as hard to evict as possible, at a minimum it must be *continually* pinned for at least 30 seconds. That's actually a pretty tall order. Although, as I said, I wouldn't be surprised if it was worth making it possible for buffers to be even more difficult to evict than that. It should be extremely difficult to evict a root B-Tree page, and to a lesser extent inner pages even under a lot of cache pressure, for example. There are lots of workloads in which that can happen, and I have a hard time believing that it's worth it to evict given the extraordinary difference in their utility as compared to a lot of other things. I can imagine a huge barrier against evicting what is actually a relatively tiny number of pages being worth it. I don't want to dismiss what you're saying about heating and cooling being unrelated, but I don't find the conclusion that not everything can be hot obvious. Maybe heat should be relative rather than absolute, and maybe that's actually what you meant. There is surely some workload where buffer access actually is perfectly uniform, and what do you do there? What temperature are those buffers? It occurs to me that within the prototype patch, even though usage_count is incremented in a vastly slower fashion (in a wall time sense), clock sweep doesn't take advantage of that. I should probably investigate having clock sweep become more aggressive in decrementing in response to realizing that it won't get some buffer's usage_count down to zero on the next revolution either. There are certainly problems with that, but they might be fixable. Within the patch, in order for it to be possible for the usage_count to be incremented in the interim, an average of 1.5 seconds must pass, so if clock sweep were to anticipate another no-set-to-zero revolution, it seems pretty likely that it would be exactly right, or if not then close enough, since it can only really fail to correct for some buffers getting incremented once more in the interim. Conceptually, it would be like multiple logical revolutions were merged into one actual one, sufficient to have the next revolution find a victim buffer. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: variant of regclass
Well, I noticed that, too, but I didn't think it was my job to tell the patch author what functions he should have wanted. A follow-on patch to add to_regprocedure and to_regoperator wouldn't be much work, if you want that. And here is a patch for that. Looks good to me except duplicate oids. Included is a patch to fix that. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0809a6d..db8691a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15294,10 +15294,18 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); /indexterm indexterm +primaryto_regprocedure/primary + /indexterm + + indexterm primaryto_regoper/primary /indexterm indexterm +primaryto_regoperator/primary + /indexterm + + indexterm primaryto_regtype/primary /indexterm @@ -15482,11 +15490,21 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); entryget the oid of the named function/entry /row row + entryliteralfunctionto_regprocedure(parameterfunc_name/parameter)/function/literal/entry + entrytyperegprocedure/type/entry + entryget the oid of the named function/entry + /row + row entryliteralfunctionto_regoper(parameteroperator_name/parameter)/function/literal/entry entrytyperegoper/type/entry entryget the oid of the named operator/entry /row row + entryliteralfunctionto_regoperator(parameteroperator_name/parameter)/function/literal/entry + entrytyperegoperator/type/entry + entryget the oid of the named operator/entry + /row + row entryliteralfunctionto_regtype(parametertype_name/parameter)/function/literal/entry entrytyperegtype/type/entry entryget the oid of the named type/entry @@ -15658,10 +15676,12 @@ SELECT collation for ('foo' COLLATE de_DE); para The functionto_regclass/function, functionto_regproc/function, - functionto_regoper/function and functionto_regtype/function - translate relation, function, operator, and type names to objects of - type typeregclass/, typeregproc/, typeregoper/ and - typeregtype/, respectively. These functions differ from a cast from + functionto_regprocedure/function, functionto_regoper/function, + functionto_regoperator/function, and functionto_regtype/function + functions translate relation, function, operator, and type names to objects + of type typeregclass/, typeregproc/, typeregprocedure/type, + typeregoper/, typeregoperator/type, and typeregtype/, + respectively. These functions differ from a cast from text in that they don't accept a numeric OID, and that they return null rather than throwing an error if the name is not found (or, for functionto_regproc/function and functionto_regoper/function, if diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index ed2bdbf..6210f45 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -323,6 +323,38 @@ regprocedurein(PG_FUNCTION_ARGS) } /* + * to_regprocedure - converts proname(args) to proc OID + * + * If the name is not found, we return NULL. + */ +Datum +to_regprocedure(PG_FUNCTION_ARGS) +{ + char *pro_name = PG_GETARG_CSTRING(0); + List *names; + int nargs; + Oid argtypes[FUNC_MAX_ARGS]; + FuncCandidateList clist; + + /* + * Parse the name and arguments, look up potential matches in the current + * namespace search list, and scan to see which one exactly matches the + * given argument types. (There will not be more than one match.) + */ + parseNameAndArgTypes(pro_name, false, names, nargs, argtypes); + + clist = FuncnameGetCandidates(names, nargs, NIL, false, false, true); + + for (; clist; clist = clist-next) + { + if (memcmp(clist-args, argtypes, nargs * sizeof(Oid)) == 0) + PG_RETURN_OID(clist-oid); + } + + PG_RETURN_NULL(); +} + +/* * format_procedure - converts proc OID to pro_name(args) * * This exports the useful functionality of regprocedureout for use @@ -722,6 +754,45 @@ regoperatorin(PG_FUNCTION_ARGS) } /* + * to_regoperator - converts oprname(args) to operator OID + * + * If the name is not found, we return NULL. + */ +Datum +to_regoperator(PG_FUNCTION_ARGS) +{ + char *opr_name_or_oid = PG_GETARG_CSTRING(0); + Oid result; + List *names; + int nargs; + Oid argtypes[FUNC_MAX_ARGS]; + + /* + * Parse the name and arguments, look up potential matches in the current + * namespace search list, and scan to see which one exactly matches the + * given argument types. (There will not be more than one match.) + */ + parseNameAndArgTypes(opr_name_or_oid, true, names, nargs, argtypes); + if (nargs == 1) + ereport(ERROR, +(errcode(ERRCODE_UNDEFINED_PARAMETER), + errmsg(missing argument), + errhint(Use NONE to denote the missing
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Hi, It's good to see focus on this - some improvements around s_b are sorely needed. On 2014-04-14 10:11:53 -0700, Peter Geoghegan wrote: 1) Throttles incrementation of usage_count temporally. It becomes impossible to increment usage_count for any given buffer more frequently than every 3 seconds, while decrementing usage_count is totally unaffected. I think this is unfortunately completely out of question. For one a gettimeofday() for every uffer pin will become a significant performance problem. Even the computation of the xact/stm start/stop timestamps shows up pretty heavily in profiles today - and they are far less frequent than buffer pins. And that's on x86 linux, where gettimeofday() is implemented as something more lightweight than a full syscall. The other significant problem I see with this is that its not adaptive to the actual throughput of buffers in s_b. In many cases there's hundreds of clock cycles through shared buffers in 3 seconds. By only increasing the usagecount that often you've destroyed the little semblance to a working LRU there is right now. It also wouldn't work well for situations with a fast changing workload s_b. If you have frequent queries that take a second or so and access some data repeatedly (index nodes or whatnot) only increasing the usagecount once will mean they'll continually fall back to disk access. 2) Has usage_count saturate at 10 (i.e. BM_MAX_USAGE_COUNT = 10), not 5 as before. ... . This step on its own would be assumed extremely counter-productive by those in the know, but I believe that other measures ameliorate the downsides. I could be wrong about how true that is in other cases, but then the case helped here isn't what you'd call a narrow benchmark. I don't see which mechanisms you have suggested that counter this? I think having more granular usagecount is a good idea, but I don't think it can realistically be implemented with the current method of choosing victim buffers. The amount of cacheline misses around that is already a major scalability limit; we surely can't make this even worse. I think it'd be possible to get back to this if we had a better bgwriter implementation. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The question about the type numeric
But the sign is 0. So is there anything wrong? have look in src/backend/utils/adt/numeric.c @ 154 155 for POS NEG defination given as 154 #define NUMERIC_POS 0x 155 #define NUMERIC_NEG 0x4000 Regards, Amul Sul -- View this message in context: http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800180p5800219.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ECPG FETCH readahead
2014-04-16 10:54 keltezéssel, Boszormenyi Zoltan írta: 2014-01-18 18:08 keltezéssel, Boszormenyi Zoltan írta: Hi, Alvaro Herrera wrote: Boszormenyi Zoltan escribió: Rebased patches after the regression test and other details were fixed in the infrastructure part. This thread started in 2010, and various pieces have been applied already and some others have changed in nature. Would you please post a new patchset, containing rebased patches that still need application, in a new email thread to be linked in the commitfest entry? I hope Thunderbird did the right thing and didn't include the reference message ID when I told it to edit as new. So supposedly this is a new thread but with all the cc: addresses kept. I have rebased all remaining patches and kept the numbering. All the patches from 18 to 25 that were previously in the ECPG infrastructure CF link are included here. There is no functional change. Because of the recent bugfixes in the ECPG area, the patchset didn't apply cleanly anymore. Rebased with no functional change. The patches themselves are a little bigger though. It's because the patches are generated with 10 lines of context, this was set in my $HOME/.gitconfig: [diff] context = 10 Best regards, Zoltán Böszörményi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund and...@2ndquadrant.com wrote: I think this is unfortunately completely out of question. For one a gettimeofday() for every uffer pin will become a significant performance problem. Even the computation of the xact/stm start/stop timestamps shows up pretty heavily in profiles today - and they are far less frequent than buffer pins. And that's on x86 linux, where gettimeofday() is implemented as something more lightweight than a full syscall. Come on, Andres. Of course exactly what I've done here is completely out of the question as a patch that we can go and commit right now. I've numerous caveats about bloating the buffer descriptors, and about it being a proof of concept. I'm pretty sure we can come up with a scheme to significantly cut down on the number of gettimeofday() calls if it comes down to it. In any case, I'm interested in advancing our understanding of the problem right now. Let's leave the minutiae to one side for the time being. The other significant problem I see with this is that its not adaptive to the actual throughput of buffers in s_b. In many cases there's hundreds of clock cycles through shared buffers in 3 seconds. By only increasing the usagecount that often you've destroyed the little semblance to a working LRU there is right now. If a usage_count can get to BM_MAX_USAGE_COUNT from its initial allocation within an instant, that's bad. It's that simple. Consider all the ways in which that can happen almost by accident. You could probably reasonably argue that the trade-off or lack of adaption (between an LRU and an LFU) that this particular sketch of mine represents is inappropriate or sub-optimal, but I don't understand why you're criticizing the patch for doing what I expressly set out to do. I wrote I think a very real problem that may be that approximating an LRU is bad because an actual LRU is bad. It also wouldn't work well for situations with a fast changing workload s_b. If you have frequent queries that take a second or so and access some data repeatedly (index nodes or whatnot) only increasing the usagecount once will mean they'll continually fall back to disk access. No, it shouldn't, because there is a notion of buffers getting a fair chance to prove themselves. Now, it might well be the case that there are workloads where what I've done to make that happen in this prototype doesn't work out too well - I've already said so. But should a buffer get a usage count of 5 just because the user inserted 5 tuples within a single DML command, for example? If so, why? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: iff - if
On 04/16/2014 12:19 AM, Andreas 'ads' Scherbaum wrote: stumbled over a number of iff in the source where if is meant - not sure what the real story behind this is, but attached is a patch to fix the about 80 occurrences. Looks like I missed something in my math lessons ... -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-16 01:58:23 -0700, Peter Geoghegan wrote: On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund and...@2ndquadrant.com wrote: I think this is unfortunately completely out of question. For one a gettimeofday() for every uffer pin will become a significant performance problem. Even the computation of the xact/stm start/stop timestamps shows up pretty heavily in profiles today - and they are far less frequent than buffer pins. And that's on x86 linux, where gettimeofday() is implemented as something more lightweight than a full syscall. Come on, Andres. Of course exactly what I've done here is completely out of the question as a patch that we can go and commit right now. I've numerous caveats about bloating the buffer descriptors, and about it being a proof of concept. I'm pretty sure we can come up with a scheme to significantly cut down on the number of gettimeofday() calls if it comes down to it. In any case, I'm interested in advancing our understanding of the problem right now. Let's leave the minutiae to one side for the time being. *I* don't think any scheme that involves measuring the time around buffer pins is going to be acceptable. It's better than I say that now rather than when you've invested significant time into the approach, no? The other significant problem I see with this is that its not adaptive to the actual throughput of buffers in s_b. In many cases there's hundreds of clock cycles through shared buffers in 3 seconds. By only increasing the usagecount that often you've destroyed the little semblance to a working LRU there is right now. If a usage_count can get to BM_MAX_USAGE_COUNT from its initial allocation within an instant, that's bad. It's that simple. Consider all the ways in which that can happen almost by accident. Yes, I agree that that's a problem. It immediately going down to zero is a problem as well though. And that's what will happen in many scenarios, because you have time limits on increasing the usagecount, but not when decreasing. It also wouldn't work well for situations with a fast changing workload s_b. If you have frequent queries that take a second or so and access some data repeatedly (index nodes or whatnot) only increasing the usagecount once will mean they'll continually fall back to disk access. No, it shouldn't, because there is a notion of buffers getting a fair chance to prove themselves. If you have a workload with (BM_MAX_USAGE_COUNT + 1) clock cycles/second, how does *any* buffer has a chance to prove itself? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Dynamic Background Workers and clean exit
Hello, I've been recently doing some work with dynamic bgworkers and noticed that I have no way of saying I am done now and want to exit cleanly because bgworkers get restarted automatically on exit code 0 no matter what is the restart interval set to. I understand the rationale for this behavior when using static bgworkers which are meant to run forever, but dynamic ones are spawned dynamically by code so they should also be able to terminate cleanly as they have presumably finite work to do. Also I think the clean shutdown of dynamic bgworker should not be logged (or at least not with the same log level as it is now) since it's business as usual. Since the dyanamic bgworkers are new in 9.4 the behavior can still be changed. Thoughts? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 2:18 AM, Andres Freund and...@2ndquadrant.com wrote: *I* don't think any scheme that involves measuring the time around buffer pins is going to be acceptable. It's better than I say that now rather than when you've invested significant time into the approach, no? Well, I do think that it will be possible to make something like that work. LRU-K/LRU-2 involves remembering the last two access times (not the last one). Researchers considered preeminent authorities on caching algorithms thought that was a good idea in 1993. There are plenty of other examples of similar schemes too. No, it shouldn't, because there is a notion of buffers getting a fair chance to prove themselves. If you have a workload with (BM_MAX_USAGE_COUNT + 1) clock cycles/second, how does *any* buffer has a chance to prove itself? There could be lots of ways. I thought about representing that more directly. I don't think that it's useful to have a large number of revolutions in search of a victim under any circumstances. Fundamentally, you're asking what if any scheme here leans too heavily towards frequency?. That could certainly be a problem, as I've said, and we could think about adaptation over heuristics, as I've said, but it is very obviously a big problem that clock sweep doesn't really care about frequency one bit right now. Why should I be the one with all the answers? Aren't you interested in the significance of the patch, and the test case? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Hi, On 2014-04-16 02:57:54 -0700, Peter Geoghegan wrote: Why should I be the one with all the answers? Who said you need to be? The only thing I am saying is that I don't agree with some of your suggestions? I only responded to the thread now because downthread (in CAM3SWZQa2OAVUrfPL-df=we1smozkbr392sw_novukzepxh...@mail.gmail.com) you further argued using the timestamp - which I think is a flawed concept. So I thought it'd be fair to argue against it now, rather than later. Aren't you interested in the significance of the patch, and the test case? Not particularly in the specifics to be honest. The tradeoffs of the techniques you used in there seem prohibitive to me. It's easy to make individual cases faster by sacrificing others. Sometimes it's useful to prototype solutions while narrowing the scope for evaluation to get faster feedback, but as I don't see the solutions to be applicable in the general case... I think it's very important to improve upon the current state. It's imo one of postgres' biggest issues. But it's also far from trivial, otherwise it'd be done already. I *personally* don't think it's very likely that we can improve significantly upon the current state as long as every process regularly participates in the clock sweep. ISTM that prevents many more elaborate techniques to be used (cache misses/bus traffic, locking). But that's just gut feeling. I also think there are bigger issues than the actual LRU/whatever behaviour, namely the scalability issues around shared buffers making both small and big s_b settings major bottlenecks. But that's just where I have seen more issues personally. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart
On 04/16/2014 02:37 PM, Craig Ringer wrote: Hi all I've been using the dynamic BGWorker support for some recent work, and I think I've found an issue with how postmaster restarts are handled. TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. Andres sensibly points out that part of this is easily solved by passing the bgworker an index into an array in a named shmem block. A simple pass-by-value Datum that can be turned into a pointer to a shmem struct. This still doesn't solve the other half of the issue, which is how to handle dynamic bgworkers after a postmaster restart. They're effectively lost/leaked: there's no way to retain a bgworker handle across restart, and no way to list bgworkers, nor is there any idempotent way to say Start a worker to do x only if it doesn't already exist (unique names, magic cookie hashes, whatever). With the current API the only solution to the second half that I see is to have bgworkers run in non-restart mode and manage them yourself. Not ideal. Instead we need one of: - A flag like BGW_UNREGISTER_ON_RESTART; - To always unregister dynamic bgws on postmaster shm clear + restart; - A way to list bgws, inspect their BackgroundWorker structs and obtain their handles; or - A way to idempotently register a bgw only if it doesn't already exist -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart
On 2014-04-16 19:11:37 +0800, Craig Ringer wrote: On 04/16/2014 02:37 PM, Craig Ringer wrote: Hi all I've been using the dynamic BGWorker support for some recent work, and I think I've found an issue with how postmaster restarts are handled. TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. Andres sensibly points out that part of this is easily solved by passing the bgworker an index into an array in a named shmem block. A simple pass-by-value Datum that can be turned into a pointer to a shmem struct. This still doesn't solve the other half of the issue, which is how to handle dynamic bgworkers after a postmaster restart. They're effectively lost/leaked: there's no way to retain a bgworker handle across restart, and no way to list bgworkers, nor is there any idempotent way to say Start a worker to do x only if it doesn't already exist (unique names, magic cookie hashes, whatever). With the current API the only solution to the second half that I see is to have bgworkers run in non-restart mode and manage them yourself. Not ideal. Instead we need one of: - A flag like BGW_UNREGISTER_ON_RESTART; - To always unregister dynamic bgws on postmaster shm clear + restart; - A way to list bgws, inspect their BackgroundWorker structs and obtain their handles; or - A way to idempotently register a bgw only if it doesn't already exist I think we should go for always unregistering dynamic bgws. There's really little justification for keeping them around after a crash cycle. While not the nicest place architecturally, it seems easy enough to do in BackgroundWorkerShmemInit() which happens to be called conveniently in a crash/restart cycle... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
(2014/04/16 6:55), Hannu Krosing wrote: -- CREATE EXTENSION postgres_fdw; CREATE SERVER loop foreign data wrapper postgres_fdw OPTIONS (port '5432', dbname 'testdb'); CREATE USER MAPPING FOR PUBLIC SERVER loop; create table onemillion ( id serial primary key, inserted timestamp default clock_timestamp(), data text ); insert into onemillion(data) select random() from generate_series(1,100); CREATE FOREIGN TABLE onemillion_pgfdw ( id int, inserted timestamp, data text ) SERVER loop OPTIONS (table_name 'onemillion', use_remote_estimate 'true'); testdb=# explain analyse select * from onemillion_pgfdw where id in (select id from onemillion where data '0.9' limit 100); QUERY PLAN - Nested Loop (cost=122.49..10871.06 rows=50 width=44) (actual time=4.269..93.444 rows=100 loops=1) - HashAggregate (cost=22.06..23.06 rows=100 width=4) (actual time=1.110..1.263 rows=100 loops=1) - Limit (cost=0.00..20.81 rows=100 width=4) (actual time=0.038..1.026 rows=100 loops=1) - Seq Scan on onemillion (cost=0.00..20834.00 rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1) Filter: (data '0.9'::text) Rows Removed by Filter: 805 - Foreign Scan on onemillion_pgfdw (cost=100.43..108.47 rows=1 width=29) (actual time=0.772..0.773 rows=1 loops=100) Total runtime: 93.820 ms (8 rows) Time: 97.283 ms -- ... actually performs 100 distinct SELECT * FROM onemillion WHERE id = $1 calls on remote side. Maybe I'm missing something, but I think that you can do what I think you'd like to do by the following procedure: postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000'); ALTER SERVER postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in (SELECT id FROM onemillion WHERE data '0.9' LIMIT 100); QUERY PLAN --- Hash Semi Join (cost=1023.10..41983.21 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Hash Cond: (onemillion_pgsql.id = onemillion.id) - Foreign Scan on public.onemillion_pgsql (cost=1000.00..39334.00 rows=100 width=29) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Remote SQL: SELECT id, inserted, data FROM public.onemillion - Hash (cost=21.85..21.85 rows=100 width=4) Output: onemillion.id - Limit (cost=0.00..20.85 rows=100 width=4) Output: onemillion.id - Seq Scan on public.onemillion (cost=0.00..20834.00 rows=99918 width=4) Output: onemillion.id Filter: (onemillion.data '0.9'::text) Planning time: 0.690 ms (14 rows) or, that as Tom mentioned, by disabling the use_remote_estimate function: postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET use_remote_estimate 'false'); ALTER FOREIGN TABLE postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in (SELECT id FROM onemillion WHERE data '0.9' LIMIT 100); QUERY PLAN -- Hash Semi Join (cost=123.10..41083.21 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Hash Cond: (onemillion_pgsql.id = onemillion.id) - Foreign Scan on public.onemillion_pgsql (cost=100.00..38434.00 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Remote SQL: SELECT id, inserted, data FROM public.onemillion - Hash (cost=21.85..21.85 rows=100 width=4) Output: onemillion.id - Limit (cost=0.00..20.85 rows=100 width=4) Output: onemillion.id - Seq Scan on public.onemillion (cost=0.00..20834.00 rows=99918 width=4) Output: onemillion.id Filter: (onemillion.data '0.9'::text) Planning time: 0.215 ms (14 rows) Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls
Thanks for the detailed feedback, I'm sorry it took so long to incorporate it. I've attached the latest version of the patch, fixing in particular: We have this block: I've re-written this so it only does a single pass through the window definitions (my patch originally added a second pass), and only does the clone if required. In gram.y there are some spurious whitespaces at end-of-line. Fixed - I didn't know about diff --check, it's very useful! Also, in parsenodes.h, you had the [MANDATORY] and such tags. I've re-written the comments (without tags) to make it much easier to understand . I agree they were ugly! Exactly what case does the in this case phrase refer to? Clarified in the comments A style issue. You have this: Fixed And a final style comment. Fixed Finally, I'm not really sure about the column you added to the regression tests table. Indeed, it was a bit artificial. I've re-written the tests to use a separate table as you suggest. Thanks - Nick diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0809a6d..5da852e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -13185,6 +13185,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; lag(replaceable class=parametervalue/replaceable typeany/ [, replaceable class=parameteroffset/replaceable typeinteger/ [, replaceable class=parameterdefault/replaceable typeany/ ]]) + [ { RESPECT | IGNORE } NULLS ] /function /entry entry @@ -13199,7 +13200,9 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and - replaceable class=parameterdefault/replaceable to null + replaceable class=parameterdefault/replaceable to null. If + literalIGNORE NULLS/ is specified then the function will be evaluated + as if the rows containing nulls didn't exist. /entry /row @@ -13212,6 +13215,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; lead(replaceable class=parametervalue/replaceable typeany/ [, replaceable class=parameteroffset/replaceable typeinteger/ [, replaceable class=parameterdefault/replaceable typeany/ ]]) + [ { RESPECT | IGNORE } NULLS ] /function /entry entry @@ -13226,7 +13230,9 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; replaceable class=parameterdefault/replaceable are evaluated with respect to the current row. If omitted, replaceable class=parameteroffset/replaceable defaults to 1 and - replaceable class=parameterdefault/replaceable to null + replaceable class=parameterdefault/replaceable to null. If + literalIGNORE NULLS/ is specified then the function will be evaluated + as if the rows containing nulls didn't exist. /entry /row @@ -13320,11 +13326,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab; note para The SQL standard defines a literalRESPECT NULLS/ or -literalIGNORE NULLS/ option for functionlead/, functionlag/, -functionfirst_value/, functionlast_value/, and -functionnth_value/. This is not implemented in -productnamePostgreSQL/productname: the behavior is always the -same as the standard's default, namely literalRESPECT NULLS/. +literalIGNORE NULLS/ option for functionfirst_value/, +functionlast_value/, and functionnth_value/. This is not +implemented in productnamePostgreSQL/productname: the behavior is +always the same as the standard's default, namely literalRESPECT NULLS/. Likewise, the standard's literalFROM FIRST/ or literalFROM LAST/ option for functionnth_value/ is not implemented: only the default literalFROM FIRST/ behavior is supported. (You can achieve diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 2fcc630..5cea825 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -2431,7 +2431,6 @@ window_gettupleslot(WindowObject winobj, int64 pos, TupleTableSlot *slot) * API exposed to window functions ***/ - /* * WinGetPartitionLocalMemory * Get working memory that lives till end of partition processing @@ -2467,6 +2466,17 @@ WinGetCurrentPosition(WindowObject winobj) } /* + * WinGetFrameOptions + * Returns the frame option flags + */ +int +WinGetFrameOptions(WindowObject winobj) +{ + Assert(WindowObjectIsValid(winobj)); + return winobj-winstate-frameOptions; +} + +/* * WinGetPartitionRowCount * Return total number of rows contained in the current partition. * diff --git a/src/backend/parser/gram.y
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 3:22 AM, Peter Geoghegan p...@heroku.com wrote: It's possible that I've misunderstood what you mean here, but do you really think it's likely that everything will be hot, in the event of using something like what I've sketched here? I think it's an important measure against this general problem that buffers really earn the right to be considered hot, so to speak. With my prototype, in order for a buffer to become as hard to evict as possible, at a minimum it must be *continually* pinned for at least 30 seconds. That's actually a pretty tall order. Although, as I said, I wouldn't be surprised if it was worth making it possible for buffers to be even more difficult to evict than that. It should be extremely difficult to evict a root B-Tree page, and to a lesser extent inner pages even under a lot of cache pressure, for example. There are lots of workloads in which that can happen, and I have a hard time believing that it's worth it to evict given the extraordinary difference in their utility as compared to a lot of other things. I can imagine a huge barrier against evicting what is actually a relatively tiny number of pages being worth it. I'm making a general statement about a property that I think a buffer eviction algorithm ought to have. I actually didn't say anything about the algorithm you've chosen one way or the other. Obviously, you've built in some protections against everything becoming hot, and that's a good thing as far as it goes. But you also have a greatly increased risk of everything becoming cold. All you need is a rate of buffer eviction that circles shared_buffers more often than once every 3 seconds, and everything will gradually cool down until you once again can't distinguish which stuff is hot from which stuff isn't. I don't want to dismiss what you're saying about heating and cooling being unrelated, but I don't find the conclusion that not everything can be hot obvious. Maybe heat should be relative rather than absolute, and maybe that's actually what you meant. There is surely some workload where buffer access actually is perfectly uniform, and what do you do there? What temperature are those buffers? Obviously, some value lower than the maximum and higher than the minimum. If they're all at max temperature and then a new buffer (a btree room or vm page, for example) comes along and is much hotter, there's no room on the scale left to express that. If they're all at min temperature and then a new buffer comes along that is just used once and thrown out, there's no room left on the scale for that buffer to emerge as a good candidate for eviction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 7:44 AM, Robert Haas robertmh...@gmail.com wrote: I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into hot and cold lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's easy to be too simplistic, here: an algorithm that requires that no more than half the list be hot will fall over badly on a workload where the working set exceeds the available cache and the really hot portion of the working set is 60% of the available cache. So you need a more sophisticated algorithm than that. But that core property that not all buffers can be hot must somehow be preserved, and our algorithm doesn't. FWIW in CLOCK-Pro segregating buffers between hot and cold is tied to eviction and the clock sweep, the ratio between hot and cold is dynamically adapted based on prior experience. The main downside is that it seems to require an indirection somewhere either in the clock sweep or buffer lookup. Maybe it's possible to avoid that with some clever engineering if we think hard enough. CLOCK-Pro may also have too little memory of hotness, making it too easy to blow the whole cache away with a burst of activity. It may be useful to have a (possibly tunable) notion of fairness where one query/backend can't take over the cache even though it may be an overall win in terms of total number of I/Os performed. Maybe we need to invent Generalized CLOCK-Pro with a larger number of levels, ranging from cold, hot and scalding to infernal. :) Regards, Ants Aasma -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart
On 04/16/2014 07:21 PM, Andres Freund wrote: On 2014-04-16 19:11:37 +0800, Craig Ringer wrote: On 04/16/2014 02:37 PM, Craig Ringer wrote: Hi all I've been using the dynamic BGWorker support for some recent work, and I think I've found an issue with how postmaster restarts are handled. TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. Andres sensibly points out that part of this is easily solved by passing the bgworker an index into an array in a named shmem block. A simple pass-by-value Datum that can be turned into a pointer to a shmem struct. This still doesn't solve the other half of the issue, which is how to handle dynamic bgworkers after a postmaster restart. They're effectively lost/leaked: there's no way to retain a bgworker handle across restart, and no way to list bgworkers, nor is there any idempotent way to say Start a worker to do x only if it doesn't already exist (unique names, magic cookie hashes, whatever). With the current API the only solution to the second half that I see is to have bgworkers run in non-restart mode and manage them yourself. Not ideal. Instead we need one of: - A flag like BGW_UNREGISTER_ON_RESTART; - To always unregister dynamic bgws on postmaster shm clear + restart; - A way to list bgws, inspect their BackgroundWorker structs and obtain their handles; or - A way to idempotently register a bgw only if it doesn't already exist I think we should go for always unregistering dynamic bgws. There's really little justification for keeping them around after a crash cycle. Seems simplest too, and extensions can reasonably expect to have to relaunch things after postmaster restart. We just have to document *where* they should do that, and that only dynamic bgworkers get cleared on postmaster restart, not static ones. I'd like to propose the following text as an addition to the comment in bgworker.h: The bgw_main_arg passed to a bgworker should be pass-by-value Datum (and not a pointer). A pointer to TopMemoryContext postmaster memory is permitted for static bgworkers, but won't work on EXEC_BACKEND platforms. If any nontrivial data must be passed to a bgworker, a named shared memory segment should be allocated to contain it. For multiple workers, the bgw_main_arg may be an index into an array of fixed-width structs in that shm segment. Note that shared memory is re-initialized if the postmaster restarts after a backend crash. Backends should always use a shared memory startup hook to initialize their shared memory so that it is re-initialized on postmaster restart. Don't think there's any good example code on how to do this in-tree, and it's too long for a sane comment. Wonder if I should add an example to contrib/worker_spi/ . There's no race anywhere btw. bgworkers are killed, then shm is cleared, then shm callbacks are rerun, then bgworkers are launched. So long as shm is reinited appropriately by an extension then static bgworkers that refer to that shm should be OK. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 11:27 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma a...@cybertec.at wrote: There's a paper on a non blocking GCLOCK algorithm, that does lock free clock sweep and buffer pinning[7]. If we decide to stay with GCLOCK it may be interesting, although I still believe that some variant of buffer nailing[8] is a better idea, my experience shows that most of the locking overhead is cache line bouncing ignoring the extreme cases where our naive spinlock implementation blows up. You might be right about that, but lets handle one problem at a time. Who knows what the bottleneck will end up being if and when we address the naivety around frequency? I want to better characterize that problem first. Just to summarize you about the previous discussion and the improvements that we decided to do in this area based on feedback are as follows: 1. Bgwriter needs to be improved so that it can help in reducing usage count and finding next victim buffer (run the clock sweep and add buffers to the free list). 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are less. 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer (a spinlock for the freelist, and an lwlock for the clock sweep). Here we can try to make it lock free based on atomic ops as well. 4. Bgwriter needs to be more aggressive, logic based on which it calculates how many buffers it needs to process needs to be improved. 5. Contention around buffer mapping locks. 6. Cacheline bouncing around the buffer header spinlocks, is there anything we can do to reduce this? 7. Choose Optimistically used buffer in StrategyGetBuffer(). 8. Don't bump the usage count every time buffer is pinned. What about: 9. Don't wait on locked buffer in the clock sweep. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 2014-02-03 11:29:22 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? That's what I was thinking about, yes. Of course, when the restart interval is 0, done until restart interval is the same as restart immediately, so for anyone who wants to *always* restart immediately there is no problem. Where you will run into trouble is if you sometimes want to wait for the restart interval and other times want to restart immediately. But I'm not sure that's a real use case. If it is, I suggest that we assign it some other, more obscure exit code and reserve 0 and 1 for what I believe will probably be the common cases. Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about exit(0) - done until restart interval, or permanently if there is none exit(1) - done permanently, even if a restart interval was declared exit(other) - crash I don't offhand see a point in an exit and restart immediately case. Why exit at all, if you could just keep running? If you *can't* just keep running, it probably means you know you've bollixed something, so that the crash case is probably what to do anyway. There's the case where you want to quickly go over all the databases, but only use one bgworker for it. I don't think there's another way to do that. I think we really should bite the bullet and change this before 9.4 comes out. The current static bgworker facility has only been out there for one release, and dynamic bgworkers aren't released yet at all. If we wait with this for 9.5, we'll annoy many more people. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote: 1. Bgwriter needs to be improved so that it can help in reducing usage count and finding next victim buffer (run the clock sweep and add buffers to the free list). 2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist are less. 3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer (a spinlock for the freelist, and an lwlock for the clock sweep). Here we can try to make it lock free based on atomic ops as well. 4. Bgwriter needs to be more aggressive, logic based on which it calculates how many buffers it needs to process needs to be improved. 5. Contention around buffer mapping locks. 6. Cacheline bouncing around the buffer header spinlocks, is there anything we can do to reduce this? 7. Choose Optimistically used buffer in StrategyGetBuffer(). 8. Don't bump the usage count every time buffer is pinned. What about: 9. Don't wait on locked buffer in the clock sweep. I don't think we do that? Or are you referring to locked buffer headers? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 01:35 PM, Etsuro Fujita wrote: (2014/04/16 6:55), Hannu Krosing wrote: ... Maybe I'm missing something, but I think that you can do what I think you'd like to do by the following procedure: No, what I'd like PostgreSQL to do is to 1. select the id+set from local table 2. select the rows from remote table with WHERE ID IN (set selected in step 1) 3. then join the original set to selected set, with any suitable join strategy The things I do not want are A. selecting all rows from remote table (this is what your examples below do) or B. selecting rows from remote table by single selects using ID = $ (this is something that I managed to do by some tweaking of costs) as A will be always slow if there are millions of rows in remote table and B is slow(ish) when the idset is over a few hundred ids I hope this is a bit better explanation than I provided before . Cheers Hannu P.S. I am not sure if this is a limitation of postgres_fdw or postgres itself P.P.S I tested a little with with Multicorn an postgresql did not request row counts for any IN plans, so it may be that the planner does not consider this kind of plan at all. (testing was on PgSQL 9.3.4) Hannu postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000'); ALTER SERVER postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in (SELECT id FROM onemillion WHERE data '0.9' LIMIT 100); QUERY PLAN --- Hash Semi Join (cost=1023.10..41983.21 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Hash Cond: (onemillion_pgsql.id = onemillion.id) - Foreign Scan on public.onemillion_pgsql (cost=1000.00..39334.00 rows=100 width=29) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Remote SQL: SELECT id, inserted, data FROM public.onemillion - Hash (cost=21.85..21.85 rows=100 width=4) Output: onemillion.id - Limit (cost=0.00..20.85 rows=100 width=4) Output: onemillion.id - Seq Scan on public.onemillion (cost=0.00..20834.00 rows=99918 width=4) Output: onemillion.id Filter: (onemillion.data '0.9'::text) Planning time: 0.690 ms (14 rows) or, that as Tom mentioned, by disabling the use_remote_estimate function: postgres=# ALTER FOREIGN TABLE onemillion_pgsql OPTIONS (SET use_remote_estimate 'false'); ALTER FOREIGN TABLE postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in (SELECT id FROM onemillion WHERE data '0.9' LIMIT 100); QUERY PLAN -- Hash Semi Join (cost=123.10..41083.21 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Hash Cond: (onemillion_pgsql.id = onemillion.id) - Foreign Scan on public.onemillion_pgsql (cost=100.00..38434.00 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Remote SQL: SELECT id, inserted, data FROM public.onemillion - Hash (cost=21.85..21.85 rows=100 width=4) Output: onemillion.id - Limit (cost=0.00..20.85 rows=100 width=4) Output: onemillion.id - Seq Scan on public.onemillion (cost=0.00..20834.00 rows=99918 width=4) Output: onemillion.id Filter: (onemillion.data '0.9'::text) Planning time: 0.215 ms (14 rows) Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Tue, Apr 15, 2014 at 11:44 PM, Robert Haas robertmh...@gmail.com wrote: I think that the basic problem here is that usage counts increase when buffers are referenced, but they decrease when buffers are evicted, and those two things are not in any necessary way connected to each other. In particular, if no eviction is happening, reference counts will converge to the maximum value. I've read a few papers about algorithms that attempt to segregate the list of buffers into hot and cold lists, and an important property of such algorithms is that they mustn't be allowed to make everything hot. It's easy to be too simplistic, here: an algorithm that requires that no more than half the list be hot will fall over badly on a workload where the working set exceeds the available cache and the really hot portion of the working set is 60% of the available cache. So you need a more sophisticated algorithm than that. But that core property that not all buffers can be hot must somehow be preserved, and our algorithm doesn't. A while back you sketched out an idea that did something like that: hotly accessed buffers became 'perma-pinned' such that they no longer participated in the clock sweep for eviction and there was a side-line process that did a two stage eviction (IIRC) from the super hot stack in order to mitigate locking. This idea had a couple of nice properties: 1) very hot buffers no longer get refcounted, reducing spinlock contention (which has been documented in real world workloads) 2) eviction loop shrinks. although you still have to check the 'very hot' flag, thats an unlocked check (again, IIRC) and no further processing is done. The downside of this approach was complexity and difficult to test for edge case complexity. I would like to point out though that while i/o efficiency gains are nice, I think contention issues are the bigger fish to fry. On Mon, Apr 14, 2014 at 12:11 PM, Peter Geoghegan p...@heroku.com wrote: 1) Throttles incrementation of usage_count temporally. It becomes impossible to increment usage_count for any given buffer more frequently than every 3 seconds, while decrementing usage_count is totally unaffected. hm, that's expensive. how about a heuristic based on the number of buffer allocations and the size of the buffer pool? On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote: What about: 9. Don't wait on locked buffer in the clock sweep. I don't think we do that? Or are you referring to locked buffer headers? Right -- exactly. I posted patch for this a while back. It's quite trivial: implement a trylock variant of the buffer header lock macro and further guard the check with a non-locking test (which TAS() already does generally, but the idea is to avoid the cache line lock in likely cases of contention). I believe this to be unambiguously better: even if it's self healing or unlikely, there is no good reason to jump into a spinlock fray or even request a contented cache line while holding a critical lock. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
Josh Berkus wrote: On 04/15/2014 02:25 PM, Josh Berkus wrote: Hackers, We need documentation on how users should intelligently set the multixact freeze settings. I'm happy to write the actual text, but I definitely don't have any idea how to set these myself. Under what circumstances should they be different from freeze_max_age? How? Measure consumption rate of multixacts, compare to consumption rate of xids, and set the freeze ages so that they are reached more-or-less at the same time, so that freezing for any of them would also freeze the other one. You need to set both table_freeze_ages to values that would be reached later than both min_freeze_ages would be reached, if you get what I mean. The idea is that full scan of a table would fix both things at once, saving a followup full scan shortly after the first one. You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... pg_class.relminmxid is the oldest multixact value that might be present in a table. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
Bruce Momjian wrote: Yes, I saw that yesterday and fixed it. I also did a dry run of backpatching and only 8.4 had conflicts, so I think we are good there. (This is like the readdir() fix all over again.) Once this is applied I will work on changing the libpq socket type to use portable pgsocket, but I am not planning to backpatch that unless we find a bug. Are we done with this patch? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 16/04/14 15:10, Andres Freund wrote: I think we really should bite the bullet and change this before 9.4 comes out. The current static bgworker facility has only been out there for one release, and dynamic bgworkers aren't released yet at all. If we wait with this for 9.5, we'll annoy many more people. +1 On 2014-02-03 11:29:22 -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 3, 2014 at 10:20 AM, Tom Lane t...@sss.pgh.pa.us wrote: So exit(0) - done, permanently exit(1) - done until restart interval exit(other) - crash and there's no way to obtain the restart immediately behavior? Also I think if we do it this way, the incompatibility impact is rather small for most existing bgworkers, like Robert I haven't seen anybody actually using the exit code 0 currently - I am sure somebody does, but it seems to be very small minority. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-16 08:25:23 -0500, Merlin Moncure wrote: The downside of this approach was complexity and difficult to test for edge case complexity. I would like to point out though that while i/o efficiency gains are nice, I think contention issues are the bigger fish to fry. That's my feeling as well. On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote: What about: 9. Don't wait on locked buffer in the clock sweep. I don't think we do that? Or are you referring to locked buffer headers? Right -- exactly. I posted patch for this a while back. It's quite trivial: implement a trylock variant of the buffer header lock macro and further guard the check with a non-locking test (which TAS() already does generally, but the idea is to avoid the cache line lock in likely cases of contention). I believe this to be unambiguously better: even if it's self healing or unlikely, there is no good reason to jump into a spinlock fray or even request a contented cache line while holding a critical lock. IIRC you had problems proving the benefits of that, right? I think that's because the locking times of buffer headers are short enough that it's really unlikely to read a locked buffer header spinlock. The spinlock acquiration will have made the locker the exclusive owner of the spinlock in the majority of cases, and as soon as that happens the cache miss/transfer will take far longer than the lock takes. I think this is the wrong level to optimize things. Imo there's two possible solutions (that don't exclude each other): * perform the clock sweep in one process so there's a very fast way to get to a free buffer. Possibly in a partitioned way. * Don't take a global exclusive lock while performing the clock sweep. Instead increase StrategyControl-nextVictimBuffer in chunks under an exclusive lock, and then scan the potential victim buffers in those chunks without a global lock held. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
On Wed, Apr 16, 2014 at 10:34:55AM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: Yes, I saw that yesterday and fixed it. I also did a dry run of backpatching and only 8.4 had conflicts, so I think we are good there. (This is like the readdir() fix all over again.) Once this is applied I will work on changing the libpq socket type to use portable pgsocket, but I am not planning to backpatch that unless we find a bug. Are we done with this patch? I am about to patch it now. I was going to do it yesterday but was concerned I wouldn't be online long enough to catch any buildfarm failure. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test failure on latest source
On 13/04/2014 18:09, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-12 16:35:48 -0400, Tom Lane wrote: In principle, that commit shouldn't have affected behavior for pg_hba entries with numeric address fields ... Hm. getaddrinfo.c has this bit: /* Unsupported flags. */ if (flags NI_NAMEREQD) return EAI_AGAIN; Yeah, and that flag is only ever specified when attempting to do reverse lookup on a client address to see if it matches a non-numeric pg_hba entry. regards, tom lane just to recap, as I think no one have yet proposed/implemented a solution. first failure I see on cygwin is from - $ git log |head commit fc752505a99a4e2c781a070d3d42a25289c22e3c Author: Tom Lane t...@sss.pgh.pa.us Date: Wed Apr 2 17:11:24 2014 -0400 Fix assorted issues in client host name lookup. [cut] --- previous one is fine -- commit f33a71a7865a1dd54f04b370e2637f88665f8db8 Author: Tom Lane t...@sss.pgh.pa.us Date: Wed Apr 2 14:30:08 2014 -0400 De-anonymize the union in JsonbValue. Needed for strict C89 compliance. error log cat /pub/devel/postgresql/prova/build_orig/src/test/regress/log/postmaster.log LOG: could not resolve localhost: Non-recoverable failure in name resolution LOG: disabling statistics collector for lack of working socket WARNING: autovacuum not started because of misconfiguration HINT: Enable the track_counts option. LOG: invalid IP address 127.0.0.1: Non-recoverable failure in name resolution CONTEXT: line 86 of configuration file /pub/devel/postgresql/prova/build_orig/src/test/regress/./tmp_check/data/pg_hba.conf FATAL: could not load pg_hba.conf --- and of course localhost and 127.0.0.1 are valid $ ping localhost Pinging GE-MATZERI-EU [127.0.0.1] with 32 bytes of data: Reply from 127.0.0.1: bytes=32 time1ms TTL=128 [cut] $ ping 127.0.0.1 Pinging 127.0.0.1 with 32 bytes of data: Reply from 127.0.0.1: bytes=32 time1ms TTL=128 Reply from 127.0.0.1: bytes=32 time1ms TTL=128 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: I think this is the wrong level to optimize things. Imo there's two possible solutions (that don't exclude each other): * perform the clock sweep in one process so there's a very fast way to get to a free buffer. Possibly in a partitioned way. * Don't take a global exclusive lock while performing the clock sweep. Instead increase StrategyControl-nextVictimBuffer in chunks under an exclusive lock, and then scan the potential victim buffers in those chunks without a global lock held. I definitely agree with both of these ideas. But isn't it sort of off-topic for this thread? There are two issues here: 1. Improving the rate at which we can evict buffers, which is what you're talking about here. 2. Improving the choice of which buffers we evict, which is what Peter's talking about, or at least what I think he's talking about. Those things are both important, but they're different, and I'm not sure that working on one precludes working on the other. There's certainly the potential for overlap, but not necessarily. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund and...@2ndquadrant.com wrote: Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about exit(0) - done until restart interval, or permanently if there is none exit(1) - done permanently, even if a restart interval was declared exit(other) - crash I don't offhand see a point in an exit and restart immediately case. Why exit at all, if you could just keep running? If you *can't* just keep running, it probably means you know you've bollixed something, so that the crash case is probably what to do anyway. There's the case where you want to quickly go over all the databases, but only use one bgworker for it. I don't think there's another way to do that. I think we really should bite the bullet and change this before 9.4 comes out. The current static bgworker facility has only been out there for one release, and dynamic bgworkers aren't released yet at all. If we wait with this for 9.5, we'll annoy many more people. So, exactly what do you want to change? If you want to keep the restart-immediately behavior, then that argues for NOT changing exit(0). I actually think the right answer here might be to give background workers a way to change their configured restart interval. We've already got a shared memory area that the postmaster reads to know how what to do when starting a dynamic background worker, so it probably wouldn't be that hard. But I'm not volunteering to undertake that on April 16th. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Archive recovery won't be completed on some situation.
On Tue, Apr 15, 2014 at 2:52 AM, Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp wrote: Hello, thank you for the discussion. At Tue, 1 Apr 2014 11:41:20 -0400, Robert Haas wrote I don't find that very radical at all. The backup_label file is *supposed* to be removed on the master if it crashes during the backup; and it should never be removed from the backup itself. At least that's how I understand it. Unfortunately, people too often The code indeed seems to assume that, and I couldn't think of any measure to avoid that dead-end once recovery sequence reads backup label accidentially left behind. I thought up to remove backup label during immediate shutdown on prvious versions, like 9.4 does. CancelBackup does only stat-unlink-rename sequence so I think this doesn't obstruct immediate shutdown sequence. And this doesn't change any seeming behavior or interfaces just except for this case. What do you think about this? Isn't this also applicable for older versions? I don't think we should consider changing long-established behavior in the back-branches. The old behavior may not be ideal, but that doesn't mean it's a bug. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-16 10:29:29 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 9:35 AM, Andres Freund and...@2ndquadrant.com wrote: I think this is the wrong level to optimize things. Imo there's two possible solutions (that don't exclude each other): * perform the clock sweep in one process so there's a very fast way to get to a free buffer. Possibly in a partitioned way. * Don't take a global exclusive lock while performing the clock sweep. Instead increase StrategyControl-nextVictimBuffer in chunks under an exclusive lock, and then scan the potential victim buffers in those chunks without a global lock held. I definitely agree with both of these ideas. But isn't it sort of off-topic for this thread? Yes, I agree it's somewhat offtopic - I only started on it (I think) because Merlin commented on it. But I also agree with Merlin's that comment at the moment that the scalability issues (concurrency and size of shared buffers). If you can't use a large enough s_b to contain a significant portion of your workload, you're relying on the OS cache anyway. 1. Improving the rate at which we can evict buffers, which is what you're talking about here. 2. Improving the choice of which buffers we evict, which is what Peter's talking about, or at least what I think he's talking about. Those things are both important, but they're different, and I'm not sure that working on one precludes working on the other. There's certainly the potential for overlap, but not necessarily. I don't think that that they neccessarily preclude each other either. But my gut feeling tells me that it'll be hard to have interesting algorithmic improvements on the buffer eviction choice because any additional complexity around that will have prohibitively high scalability impacts due to the coarse locking. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
On Fri, Apr 11, 2014 at 08:28:55AM -0400, Bruce Momjian wrote: On Fri, Apr 11, 2014 at 10:03:08AM +0530, Amit Kapila wrote: On Fri, Apr 11, 2014 at 10:00 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Apr 10, 2014 at 5:21 PM, Bruce Momjian br...@momjian.us wrote: On Thu, Apr 10, 2014 at 11:05:49AM +0530, Amit Kapila wrote: Ah, yes, good point. This is going to require backpatching then. I also think so. I think it's better to use check like below, just for matter of consistency with other place if (sock == INVALID_SOCKET) Agreed. That is how I have coded the patch. Sorry, I didn't checked the latest patch before that comment. I verified that your last patch is fine. Regression test also went fine. I have noticed small thing which I forgot to mention in previous mail. I think below added extra line is not required. int PQsocket(const PGconn *conn) { + Yes, I saw that yesterday and fixed it. I also did a dry run of backpatching and only 8.4 had conflicts, so I think we are good there. (This is like the readdir() fix all over again.) Patch applied back through 9.0. 8.4 didn't have the infrastructure for a proper fix. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Hi, Stephen flagged a ENOPARSE: On 2014-04-16 16:49:55 +0200, Andres Freund wrote: But I also agree with Merlin's that comment at the moment that the scalability issues (concurrency and size of shared buffers). That should have been: But I also agree with Merlin's comment that at the moment the scalability issues are bigger than the cache eviction choices. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The question about the type numeric
On Tue, Apr 15, 2014 at 5:37 AM, sure.postgres sure.postg...@gmail.com wrote: Hi hackers, I am learning about numeric . The comment of NumericShort format is: * In the NumericShort format, the remaining 14 bits of the header word * (n_short.n_header) are allocated as follows: 1 for sign (positive or * negative), 6 for dynamic scale, and 7 for weight. In practice, most * commonly-encountered values can be represented this way. So the Max of the NumericShort format should be up to 508 digits before the decimal point. So the sign of the number 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567 should be 0x807F. The number is 257 digits before the decimal point. But the sign is 0. So is there anything wrong? I'm not sure I understand the question, but if it helps, the sign bit will be set (1) for negative values and clear (0) for positive values. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test failure on latest source
Marco Atzeri wrote: On 13/04/2014 18:09, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-12 16:35:48 -0400, Tom Lane wrote: In principle, that commit shouldn't have affected behavior for pg_hba entries with numeric address fields ... Hm. getaddrinfo.c has this bit: /* Unsupported flags. */ if (flags NI_NAMEREQD) return EAI_AGAIN; Yeah, and that flag is only ever specified when attempting to do reverse lookup on a client address to see if it matches a non-numeric pg_hba entry. I don't know if this is relevant, but perhaps we're defining the constants in a way that conflicts with the values defined by cygwin. A very quick search finds a 2007 patch for Mutt[1] that seems to have NI_NAMEREQD defined as 8 somewhere, while 4 is NI_NOFQDN. But we have this in getaddrinfo.h: #ifndef NI_NAMEREQD #define NI_NAMEREQD 4 #endif So maybe we're doing something wrong. Indeed, my system has in /usr/include/netdb.h # define NI_NAMEREQD8 /* Don't return numeric addresses. */ You'd do well to research this more, I think. [1] http://marc.info/?l=mutt-devm=117752314512877w=2 -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 2014-04-16 10:37:12 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 9:10 AM, Andres Freund and...@2ndquadrant.com wrote: Agreed, but after further reflection it seems like if you've declared a restart interval, then done until restart interval is probably the common case. So how about exit(0) - done until restart interval, or permanently if there is none exit(1) - done permanently, even if a restart interval was declared exit(other) - crash I don't offhand see a point in an exit and restart immediately case. Why exit at all, if you could just keep running? If you *can't* just keep running, it probably means you know you've bollixed something, so that the crash case is probably what to do anyway. There's the case where you want to quickly go over all the databases, but only use one bgworker for it. I don't think there's another way to do that. I think we really should bite the bullet and change this before 9.4 comes out. The current static bgworker facility has only been out there for one release, and dynamic bgworkers aren't released yet at all. If we wait with this for 9.5, we'll annoy many more people. So, exactly what do you want to change? If you want to keep the restart-immediately behavior, then that argues for NOT changing exit(0). I don't neccessarily want to keep that behaviour. It can be emulated easily enough by setting the restart interval to 0. I just wanted to explain that it's not completely unreasonable to want such a short restart interval. I actually think the right answer here might be to give background workers a way to change their configured restart interval. We've already got a shared memory area that the postmaster reads to know how what to do when starting a dynamic background worker, so it probably wouldn't be that hard. But I'm not volunteering to undertake that on April 16th. That seems like a separate feature - and I don't see a need to rush that into 9.4. All I want that if we decide to change the API, we should do it now. What I dislike about the status quo is that the 1) The regular exit(0) behaves all but regular. 2) The only way to regularly exit is logged as a failure. Thus there's no real way to flag a failure that's actually a failure. So I think your proposal above already is an improvement upon the status quo. Maybe we also should change the logging of bgworker exits. I think we probably also need a way to exit that's treated as an error, but doesn't lead to a PANIC restart. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 10:49 AM, Andres Freund and...@2ndquadrant.com wrote: 1. Improving the rate at which we can evict buffers, which is what you're talking about here. 2. Improving the choice of which buffers we evict, which is what Peter's talking about, or at least what I think he's talking about. Those things are both important, but they're different, and I'm not sure that working on one precludes working on the other. There's certainly the potential for overlap, but not necessarily. I don't think that that they neccessarily preclude each other either. But my gut feeling tells me that it'll be hard to have interesting algorithmic improvements on the buffer eviction choice because any additional complexity around that will have prohibitively high scalability impacts due to the coarse locking. Doesn't that amount to giving up? I mean, I'm not optimistic about the particular approach Peter's chosen here being practical for the reasons that you and I already articulated. But I don't think that means there *isn't* a viable approach; and I think Peter's test results demonstrate that the additional complexity of a better algorithm can more than pay for itself. That's a pretty important point to keep in mind. Also, I think the scalability problems around buffer eviction are eminently solvable, and in particular I'm hopeful that Amit is going to succeed in solving them. Suppose we have a background process (whether the background writer or some other) that runs the clock sweep, identifies good candidates for eviction, and pushes them on a set of, say, 16 free-lists protected by spinlocks. (The optimal number of free-lists probably depends on the size of shared_buffers.) Backends try to reclaim by popping buffers off of one of these free-lists and double-checking whether the page is still a good candidate for eviction (i.e. it's still clean and unpinned). If the free list is running low, they kick the background process via a latch to make sure it's awake and working to free up more stuff, and if necessary, advance the clock sweep themselves. This can even be done by multiple processes at once, if we adopt your idea of advancing the clock sweep hand by N buffers at a time and then scanning them afterwards without any global lock. In such a world, it's still not permissible for reclaim calculations to be super-complex, but you hope that most of the activity is happening in the background process, so cycle-shaving becomes less critical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of 0
On Fri, Apr 11, 2014 at 08:28:55AM -0400, Bruce Momjian wrote: Once this is applied I will work on changing the libpq socket type to use portable pgsocket, but I am not planning to backpatch that unless we find a bug. Attached is a follow up patch which stores socket values in libpq as pgsocket, rather than int, and maps it to -1 only for the PQsocket() external return value. In the interest of time, I will apply this later today, and only to head as it does not fix a bug. This is the last open item I was working on. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c new file mode 100644 index 51d4de4..90b944a *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** pqDropConnection(PGconn *conn) *** 398,406 /* Drop any SSL state */ pqsecure_close(conn); /* Close the socket itself */ ! if (conn-sock = 0) closesocket(conn-sock); ! conn-sock = -1; /* Discard any unread/unsent data */ conn-inStart = conn-inCursor = conn-inEnd = 0; conn-outCount = 0; --- 398,406 /* Drop any SSL state */ pqsecure_close(conn); /* Close the socket itself */ ! if (conn-sock != PGINVALID_SOCKET) closesocket(conn-sock); ! conn-sock = PGINVALID_SOCKET; /* Discard any unread/unsent data */ conn-inStart = conn-inCursor = conn-inEnd = 0; conn-outCount = 0; *** keep_going: /* We will come back to *** 1631,1654 addr_cur-ai_addrlen); conn-raddr.salen = addr_cur-ai_addrlen; ! /* Open a socket */ ! { ! /* ! * While we use 'pgsocket' as the socket type in the ! * backend, we use 'int' for libpq socket values. ! * This requires us to map PGINVALID_SOCKET to -1 ! * on Windows. ! * See http://msdn.microsoft.com/en-us/library/windows/desktop/ms740516%28v=vs.85%29.aspx ! */ ! pgsocket sock = socket(addr_cur-ai_family, SOCK_STREAM, 0); ! #ifdef WIN32 ! if (sock == PGINVALID_SOCKET) ! conn-sock = -1; ! else ! #endif ! conn-sock = sock; ! } ! if (conn-sock == -1) { /* * ignore socket() failure if we have more addresses --- 1631,1638 addr_cur-ai_addrlen); conn-raddr.salen = addr_cur-ai_addrlen; ! conn-sock = socket(addr_cur-ai_family, SOCK_STREAM, 0); ! if (conn-sock == PGINVALID_SOCKET) { /* * ignore socket() failure if we have more addresses *** makeEmptyPGconn(void) *** 2717,2723 conn-client_encoding = PG_SQL_ASCII; conn-std_strings = false; /* unless server says differently */ conn-verbosity = PQERRORS_DEFAULT; ! conn-sock = -1; conn-auth_req_received = false; conn-password_needed = false; conn-dot_pgpass_used = false; --- 2701,2707 conn-client_encoding = PG_SQL_ASCII; conn-std_strings = false; /* unless server says differently */ conn-verbosity = PQERRORS_DEFAULT; ! conn-sock = PGINVALID_SOCKET; conn-auth_req_received = false; conn-password_needed = false; conn-dot_pgpass_used = false; *** closePGconn(PGconn *conn) *** 2882,2888 * Note that the protocol doesn't allow us to send Terminate messages * during the startup phase. */ ! if (conn-sock = 0 conn-status == CONNECTION_OK) { /* * Try to send close connection message to backend. Ignore any --- 2866,2872 * Note that the protocol doesn't allow us to send Terminate messages * during the startup phase. */ ! if (conn-sock != PGINVALID_SOCKET conn-status == CONNECTION_OK) { /* * Try to send close connection message to backend. Ignore any *** PQgetCancel(PGconn *conn) *** 3103,3109 if (!conn) return NULL; ! if (conn-sock 0) return NULL; cancel = malloc(sizeof(PGcancel)); --- 3087,3093 if (!conn) return NULL; ! if (conn-sock == PGINVALID_SOCKET) return NULL; cancel = malloc(sizeof(PGcancel)); *** PQrequestCancel(PGconn *conn) *** 3284,3290 if (!conn) return FALSE; ! if (conn-sock 0) { strlcpy(conn-errorMessage.data, PQrequestCancel() -- connection is not open\n, --- 3268,3274 if (!conn) return FALSE; ! if (conn-sock == PGINVALID_SOCKET) { strlcpy(conn-errorMessage.data, PQrequestCancel() -- connection is not open\n, *** PQsocket(const PGconn *conn) *** 5329,5335 { if (!conn) return -1; ! return conn-sock; } int --- 5313,5319 { if (!conn) return -1; ! return (conn-sock != PGINVALID_SOCKET) ? conn-sock : -1; } int diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c new file mode 100644 index 8ccf6d3..50e4035 ***
Re: [HACKERS] test failure on latest source
On 16/04/2014 17:14, Alvaro Herrera wrote: Marco Atzeri wrote: On 13/04/2014 18:09, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-12 16:35:48 -0400, Tom Lane wrote: In principle, that commit shouldn't have affected behavior for pg_hba entries with numeric address fields ... Hm. getaddrinfo.c has this bit: /* Unsupported flags. */ if (flags NI_NAMEREQD) return EAI_AGAIN; Yeah, and that flag is only ever specified when attempting to do reverse lookup on a client address to see if it matches a non-numeric pg_hba entry. I don't know if this is relevant, but perhaps we're defining the constants in a way that conflicts with the values defined by cygwin. A very quick search finds a 2007 patch for Mutt[1] that seems to have NI_NAMEREQD defined as 8 somewhere, while 4 is NI_NOFQDN. But we have this in getaddrinfo.h: #ifndef NI_NAMEREQD #define NI_NAMEREQD 4 #endif So maybe we're doing something wrong. Indeed, my system has in /usr/include/netdb.h # define NI_NAMEREQD8 /* Don't return numeric addresses. */ You'd do well to research this more, I think. [1] http://marc.info/?l=mutt-devm=117752314512877w=2 on cygwin both 32 and 64 bit I see netdb.h:#define NI_NAMEREQD 0x4 /* Not being able to resolve is an error. */ same on w32api/ws2tcpip.h:#define NI_NAMEREQD 0x04 curiosly I see also on roken-common.h:#define NI_NAMEREQD 0x02 $ cygcheck -f /usr/include/roken-common.h libkrb5-devel-1.5.3-1 not sure if it has any relevance at all in this case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On 2014-04-16 11:28:04 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 10:49 AM, Andres Freund and...@2ndquadrant.com wrote: 1. Improving the rate at which we can evict buffers, which is what you're talking about here. 2. Improving the choice of which buffers we evict, which is what Peter's talking about, or at least what I think he's talking about. Those things are both important, but they're different, and I'm not sure that working on one precludes working on the other. There's certainly the potential for overlap, but not necessarily. I don't think that that they neccessarily preclude each other either. But my gut feeling tells me that it'll be hard to have interesting algorithmic improvements on the buffer eviction choice because any additional complexity around that will have prohibitively high scalability impacts due to the coarse locking. Doesn't that amount to giving up? I mean, I'm not optimistic about the particular approach Peter's chosen here being practical for the reasons that you and I already articulated. But I don't think that means there *isn't* a viable approach; and I think Peter's test results demonstrate that the additional complexity of a better algorithm can more than pay for itself. That's a pretty important point to keep in mind. Well, I think it could be a very good idea to invest more resources (cpu, bus, memory) in buffer management - but doing so right *now* where it's all done under one monolithic lock will have noticeable consequences for many workloads. Spending more cycles per buffer won't be very noticeable if it's not done under a gigantic lock - right now it will be. [ reasonable proposal ]. In such a world, it's still not permissible for reclaim calculations to be super-complex, but you hope that most of the activity is happening in the background process, so cycle-shaving becomes less critical. Yes, agreed. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, Apr 16, 2014 at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote: I actually think the right answer here might be to give background workers a way to change their configured restart interval. We've already got a shared memory area that the postmaster reads to know how what to do when starting a dynamic background worker, so it probably wouldn't be that hard. But I'm not volunteering to undertake that on April 16th. That seems like a separate feature - and I don't see a need to rush that into 9.4. All I want that if we decide to change the API, we should do it now. Well, I'm pretty OK with that, since I proposed it before and still think it's important, but I'd rather leave it alone for another release than rush into something we'll just end up changing again. What I dislike about the status quo is that the 1) The regular exit(0) behaves all but regular. 2) The only way to regularly exit is logged as a failure. Thus there's no real way to flag a failure that's actually a failure. So I think your proposal above already is an improvement upon the status quo. I don't have time to write a patch for that, but I'm OK with committing it (or having someone else commit it) even at this late date, unless someone objects. Maybe we also should change the logging of bgworker exits. It's pretty clear that the logging around bgworkers is way too verbose for anything other than a long-running daemon, but I don't think we should try to fix that problem right now. It needs more careful thought than we have time to give it at this juncture. One idea might be to let the registrant of the background worker specify a logging level, or maybe just a flag bit to indicate verbose (LOG) or quiet (say, DEBUG1 or DEBUG2) logging. I think we probably also need a way to exit that's treated as an error, but doesn't lead to a PANIC restart. Why can't that be handled through ereport(ERROR/FATAL) rather than through the choice of exit status? It seems to me that the only point of the exit status is or should be to provide feedback to the postmaster on how it should respond to the background worker's untimely demise. If any other information needs to be conveyed, the worker should log that itself rather than trying to tell the postmaster what to log. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] test failure on latest source
Alvaro Herrera alvhe...@2ndquadrant.com writes: I don't know if this is relevant, but perhaps we're defining the constants in a way that conflicts with the values defined by cygwin. Hm, that's a thought, though I still don't see how it's relevant to the reported failure. Perhaps Cygwin is defining these constants somewhere other than netdb.h? Because getaddrinfo.h definitely pulls that one in first. The bigger picture though is that this code isn't failing on the buildfarm. So what we need to ask is what's different about Marco's machine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 2014-04-16 11:37:47 -0400, Robert Haas wrote: I think we probably also need a way to exit that's treated as an error, but doesn't lead to a PANIC restart. Why can't that be handled through ereport(ERROR/FATAL) rather than through the choice of exit status? It seems to me that the only point of the exit status is or should be to provide feedback to the postmaster on how it should respond to the background worker's untimely demise. If any other information needs to be conveyed, the worker should log that itself rather than trying to tell the postmaster what to log. I dislike that because it essentially requires the bgworker to have a full error catching environment like PostgresMain() has. That seems bad for many cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
Andres Freund and...@2ndquadrant.com writes: On 2014-04-16 11:37:47 -0400, Robert Haas wrote: Why can't that be handled through ereport(ERROR/FATAL) rather than through the choice of exit status? I dislike that because it essentially requires the bgworker to have a full error catching environment like PostgresMain() has. That seems bad for many cases. That sounds like utter nonsense. No sane bgwriter code is going to be able to discount the possibility of something throwing an elog(ERROR). Now, you might not need the ability to do a transaction abort, but you'll have to at least be able to terminate the process cleanly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 2014-04-16 11:54:25 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-16 11:37:47 -0400, Robert Haas wrote: Why can't that be handled through ereport(ERROR/FATAL) rather than through the choice of exit status? I dislike that because it essentially requires the bgworker to have a full error catching environment like PostgresMain() has. That seems bad for many cases. That sounds like utter nonsense. No sane bgwriter code is going to be able to discount the possibility of something throwing an elog(ERROR). Now, you might not need the ability to do a transaction abort, but you'll have to at least be able to terminate the process cleanly. Why? Throwing an error without an exception stack seems to work sensibly? The error is promoted to FATAL and the normal FATAL handling is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR case. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C
On Tue, Apr 15, 2014 at 2:23 PM, Christian Ullrich ch...@chrullrich.net wrote: * From: Robert Haas On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich ch...@chrullrich.net wrote: I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set, the postmaster etc. would ignore the events. Why not just pass a command-line switch? Because, as I wrote in the message you are quoting, I did not think that having a command-line option for the sole purpose of telling the postmaster who its parent is was a suitable solution. True, but you didn't say why, which is what I asked. You just said you didn't think it was a good idea, without elaborating. While I have you here, though, any suggestions on what the name of that option should be? I think --background is about right. Also, how should I treat the option on non-Windows platforms? Should it just not be there (= error), or be ignored if present? Well, we had a recent discussion that's related to this, about a not-entirely-dissimilar problem on Solaris: http://www.postgresql.org/message-id/22636.1392419...@sss.pgh.pa.us The proposal there was --daemonize. That seems somehow inapposite for Windows, though. I don't really have a strong opinion at this point on what the best naming is; I don't love --background, but I haven't got a better idea, either. On the topic of how the option should be handled on non-Windows platforms, I guess I'd vote for accepting it and ignoring it. The Solaris issue can (and, IMHO, should) be fixed by teaching pg_ctl to use fork()+exec()+setsid() rather than system()+hope the shell behaves the way we'd like, so we don't currently have an apparent need for any special handling for this case on any platform other than Windows. But that could change in the future, so I think it'd be smart to choose a somewhat generic option name and just define it as a no-op on non-Windows platforms for now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, Apr 16, 2014 at 12:00 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 11:54:25 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-04-16 11:37:47 -0400, Robert Haas wrote: Why can't that be handled through ereport(ERROR/FATAL) rather than through the choice of exit status? I dislike that because it essentially requires the bgworker to have a full error catching environment like PostgresMain() has. That seems bad for many cases. That sounds like utter nonsense. No sane bgwriter code is going to be able to discount the possibility of something throwing an elog(ERROR). Now, you might not need the ability to do a transaction abort, but you'll have to at least be able to terminate the process cleanly. Why? Throwing an error without an exception stack seems to work sensibly? The error is promoted to FATAL and the normal FATAL handling is performed? C.f. pg_re_throw() called via errfinish() in the ERRROR case. And... so what's the problem? You seemed to be saying that the background worker would need to a more developed error-handling environment in order to do proper logging, but here you're saying (rightly, I believe) that it doesn't. Even if it did, though, I think the right solution is to install one, not make it the postmaster's job to try to read the tea leaves in the worker's exit code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Shared Memory stuff
On Tue, Apr 15, 2014 at 10:46 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote: On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas robertmh...@gmail.com wrote: For the create case, I'm wondering if we should put the block that tests for !hmap *before* the _dosmaperr() and check for EEXIST. What is your opinion? Either way is okay, but I think the way you are suggesting is better as it will make code consistent with other place (PGSharedMemoryCreate()). OK, can you prepare a patch? Please find attached patch to address this issue. One minor point to note is that now we have to call GetLastError() twice, once inside error path and once to check EEXIST, but I think that is okay as existing code in PGSharedMemoryCreate() does it that way. OK. I committed this blindly, but I don't have a Windows dev environment, so please keep an eye on the Windows buildfarm members and provide follow-on patches if any of them get unhappy about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about optimising (Postgres_)FDW
On 04/16/2014 03:16 PM, Hannu Krosing wrote: On 04/16/2014 01:35 PM, Etsuro Fujita wrote: (2014/04/16 6:55), Hannu Krosing wrote: ... Maybe I'm missing something, but I think that you can do what I think you'd like to do by the following procedure: No, what I'd like PostgreSQL to do is to 1. select the id+set from local table 2. select the rows from remote table with WHERE ID IN (set selected in step 1) 3. then join the original set to selected set, with any suitable join strategy The things I do not want are A. selecting all rows from remote table (this is what your examples below do) or B. selecting rows from remote table by single selects using ID = $ (this is something that I managed to do by some tweaking of costs) as A will be always slow if there are millions of rows in remote table and B is slow(ish) when the idset is over a few hundred ids I hope this is a bit better explanation than I provided before . Cheers Hannu P.S. I am not sure if this is a limitation of postgres_fdw or postgres itself P.P.S I tested a little with with Multicorn an postgresql did not request row counts for any IN plans, so it may be that the planner does not consider this kind of plan at all. (testing was on PgSQL 9.3.4) Hannu Also a sample run of the two plans to illustrate my point How it is run now: testdb=# explain analyse verbose select r.data, l.data from onemillion_pgfdw r join onemillion l on r.id = l.id and l.id between 10 and 100100; QUERY PLAN -- Hash Join (cost=111.61..198.40 rows=1 width=16) (actual time=7534.360..8731.541 rows=101 loops=1) Output: r.data, l.data Hash Cond: (r.id = l.id) - Foreign Scan on public.onemillion_pgfdw r (cost=100.00..178.25 rows=2275 width=12) (actual time=1.628..8364.688 rows=100 loops=1) Output: r.id, r.inserted, r.data Remote SQL: SELECT id, data FROM public.onemillion - Hash (cost=10.39..10.39 rows=98 width=12) (actual time=0.179..0.179 rows=101 loops=1) Output: l.data, l.id Buckets: 1024 Batches: 1 Memory Usage: 5kB - Index Scan using onemillion_pkey on public.onemillion l (cost=0.42..10.39 rows=98 width=12) (actual time=0.049..0.124 rows=101 loops=1) Output: l.data, l.id Index Cond: ((l.id = 10) AND (l.id = 100100)) Total runtime: 8732.213 ms (13 rows) Time: 8733.799 ms And how the above query should be planned/executed: testdb=# explain analyse verbose select r.data, l.data from (select * from onemillion_pgfdw where id = any (array(select id from onemillion where id between 10 and 100100))) r join onemillion l on r.id = l.id; QUERY PLAN Nested Loop (cost=110.81..1104.30 rows=111 width=16) (actual time=2.756..3.738 rows=101 loops=1) Output: onemillion_pgfdw.data, l.data InitPlan 1 (returns $0) - Index Only Scan using onemillion_pkey on public.onemillion (cost=0.42..10.39 rows=98 width=4) (actual time=0.055..0.118 rows=101 loops=1) Output: onemillion.id Index Cond: ((onemillion.id = 10) AND (onemillion.id = 100100)) Heap Fetches: 101 - Foreign Scan on public.onemillion_pgfdw (cost=100.00..163.41 rows=111 width=12) (actual time=2.729..3.012 rows=101 loops=1) Output: onemillion_pgfdw.id, onemillion_pgfdw.inserted, onemillion_pgfdw.data Remote SQL: SELECT id, data FROM public.onemillion WHERE ((id = ANY ($1::integer[]))) - Index Scan using onemillion_pkey on public.onemillion l (cost=0.42..8.37 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=101) Output: l.id, l.inserted, l.data Index Cond: (l.id = onemillion_pgfdw.id) Total runtime: 4.469 ms (14 rows) Time: 6.437 ms postgres=# ALTER SERVER loop OPTIONS (ADD fdw_startup_cost '1000'); ALTER SERVER postgres=# EXPLAIN VERBOSE SELECT * FROM onemillion_pgsql WHERE id in (SELECT id FROM onemillion WHERE data '0.9' LIMIT 100); QUERY PLAN --- Hash Semi Join (cost=1023.10..41983.21 rows=100 width=30) Output: onemillion_pgsql.id, onemillion_pgsql.inserted, onemillion_pgsql.data Hash Cond: (onemillion_pgsql.id = onemillion.id) - Foreign Scan on public.onemillion_pgsql (cost=1000.00..39334.00 rows=100 width=29)
Re: [HACKERS] bgworker crashed or not?
On 2014-04-16 12:04:26 -0400, Robert Haas wrote: And... so what's the problem? You seemed to be saying that the background worker would need to a more developed error-handling environment in order to do proper logging, but here you're saying (rightly, I believe) that it doesn't. Even if it did, though, I think the right solution is to install one, not make it the postmaster's job to try to read the tea leaves in the worker's exit code. Well, currently it will log the message that has been thrown, that might lack context. LogChildExit() already has code to print activity of the bgworker after it crashed. Note that a FATAL error will always use a exit(1) - so we better not make that a special case in the code :/. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BGWorkers, shared memory pointers, and postmaster restart
On Wed, Apr 16, 2014 at 7:11 AM, Craig Ringer cr...@2ndquadrant.com wrote: TL;DR: I don't think there's a safe way to use a BGWorker (static or dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg Datum that points into shared memory, and think we might need a API change to fix that. Andres sensibly points out that part of this is easily solved by passing the bgworker an index into an array in a named shmem block. A simple pass-by-value Datum that can be turned into a pointer to a shmem struct. Yes. I think the answer to your original complaint is that we don't currently support that, and adding support for that is material for a future release. This still doesn't solve the other half of the issue, which is how to handle dynamic bgworkers after a postmaster restart. They're effectively lost/leaked: there's no way to retain a bgworker handle across restart, and no way to list bgworkers, nor is there any idempotent way to say Start a worker to do x only if it doesn't already exist (unique names, magic cookie hashes, whatever). Fair point. With the current API the only solution to the second half that I see is to have bgworkers run in non-restart mode and manage them yourself. Not ideal. Instead we need one of: - A flag like BGW_UNREGISTER_ON_RESTART; I would be OK with this, maybe modulo the name. - To always unregister dynamic bgws on postmaster shm clear + restart; I don't particularly care for this. Let's suppose the background worker is a long-running daemon, like a PG equivalent of cron. In static-background worker land, the admin has to restart the cluster to get this going. In dynamic-background worker land, he can load it on the fly. But once he gets it up and running, he wants it to stay up and running, surviving crashes and everything. That's a big part of the value of having a background worker interface in the first place. - A way to list bgws, inspect their BackgroundWorker structs and obtain their handles; or This is certainly a good idea. - A way to idempotently register a bgw only if it doesn't already exist This is probably a good idea, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 12:04:26 -0400, Robert Haas wrote: And... so what's the problem? You seemed to be saying that the background worker would need to a more developed error-handling environment in order to do proper logging, but here you're saying (rightly, I believe) that it doesn't. Even if it did, though, I think the right solution is to install one, not make it the postmaster's job to try to read the tea leaves in the worker's exit code. Well, currently it will log the message that has been thrown, that might lack context. LogChildExit() already has code to print activity of the bgworker after it crashed. I'm still not seeing the problem. It's the background worker's job to make sure that the right stuff gets logged, just as it would be for any other backend. Trying to bolt some portion of the responsibility for that onto the postmaster is 100% wrong. Note that a FATAL error will always use a exit(1) - so we better not make that a special case in the code :/. Well, everything's a special case, in that every error code has (and should have) its own meaning. But the handling we give to exit code 1 is not obviously incompatible with what you probably want to have happen after a FATAL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: variant of regclass
On Wed, Apr 16, 2014 at 3:27 AM, Tatsuo Ishii is...@postgresql.org wrote: Well, I noticed that, too, but I didn't think it was my job to tell the patch author what functions he should have wanted. A follow-on patch to add to_regprocedure and to_regoperator wouldn't be much work, if you want that. And here is a patch for that. Looks good to me except duplicate oids. Included is a patch to fix that. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dynamic Background Workers and clean exit
On Wed, Apr 16, 2014 at 5:27 AM, Petr Jelinek p...@2ndquadrant.com wrote: Hello, I've been recently doing some work with dynamic bgworkers and noticed that I have no way of saying I am done now and want to exit cleanly because bgworkers get restarted automatically on exit code 0 no matter what is the restart interval set to. I understand the rationale for this behavior when using static bgworkers which are meant to run forever, but dynamic ones are spawned dynamically by code so they should also be able to terminate cleanly as they have presumably finite work to do. Also I think the clean shutdown of dynamic bgworker should not be logged (or at least not with the same log level as it is now) since it's business as usual. Since the dyanamic bgworkers are new in 9.4 the behavior can still be changed. Thoughts? See the other thread where this is being discussed... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] AF_UNSPEC vs PF_UNSPEC
While wondering what the heck is going on in http://www.postgresql.org/message-id/534e8fbb.9060...@gmail.com I chanced to notice that pgstat.c and a couple of other places set up arguments for getaddrinfo() like this: hints.ai_family = PF_UNSPEC; whereas the Single Unix Spec says clearly that AF_UNSPEC is what to write if you're not intending to constrain the address family. AF_UNSPEC is what we use in the majority of places, too. On Linux, at least, these symbols have the same value so it doesn't matter; but I wonder whether they are different on recent Cygwin. Anyway, I think this is clearly wrong and we should change it. I see a PF_INET that presumably ought to be AF_INET in pg_dump/parallel.c, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 2014-04-16 12:20:01 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 12:04:26 -0400, Robert Haas wrote: And... so what's the problem? You seemed to be saying that the background worker would need to a more developed error-handling environment in order to do proper logging, but here you're saying (rightly, I believe) that it doesn't. Even if it did, though, I think the right solution is to install one, not make it the postmaster's job to try to read the tea leaves in the worker's exit code. Well, currently it will log the message that has been thrown, that might lack context. LogChildExit() already has code to print activity of the bgworker after it crashed. I'm still not seeing the problem. It's the background worker's job to make sure that the right stuff gets logged, just as it would be for any other backend. Trying to bolt some portion of the responsibility for that onto the postmaster is 100% wrong. Well, it already has taken on that responsibility, it's not my idea to add it. I merely want to control more precisely what happens. Note that a FATAL error will always use a exit(1) - so we better not make that a special case in the code :/. Well, everything's a special case, in that every error code has (and should have) its own meaning. But the handling we give to exit code 1 is not obviously incompatible with what you probably want to have happen after a FATAL. That was essentially directed at Tom's proposal in http://www.postgresql.org/message-id/32224.1391444...@sss.pgh.pa.us - I don't think that'd be compatible with FATAL errors. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster
On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: It certainly might be --- I have no idea. What surprised me is that we are relying solely on system() to block signals to pg_ctl-spawned servers. The question is whether that is sufficient and whether we should be doing more. I don't think we have to make adjustments just for Solaris 9. We aren't relying on system(); it does no such thing, according to the POSIX spec. If it did, pg_ctl would be unable to print any errors to the terminal, because dissociating from the foreground process group generally also disables your ability to print on the terminal. I poked around in the POSIX spec a bit, and if I'm reading it correctly, the only thing that typically results in the postmaster becoming dissociated from the terminal is use of to launch it. In a shell with job control, that should result in the process being put into a background process group that won't receive terminal signals nor be permitted to do I/O to it. This is distinct from dissociating altogether because you can use fg to return the process to foreground; if we did a setsid() we'd lose that option, if I'm reading the standards correctly. So I'm loath to see the postmaster doing setsid() for itself. POSIX also mandates that interactive shells have job control enabled by default. However ... the isn't issued in the user's interactive shell. It's seen by the shell launched by pg_ctl's system() call. So it appears to be standards-conforming if that shell does nothing to move the launched postmaster into the background. The POSIX spec describes a shell switch -m that forces subprocesses to be launched in their own process groups. So maybe what we ought to do is teach pg_ctl to do something like system(set -m; postgres ...); Dunno if this is really portable, though it ought to be. Alternatively, we could do what the comments in pg_ctl have long thought desirable, namely get rid of use of system() in favor of fork()/exec(). With that, pg_ctl could do a setsid() inside the child process. I like this last approach. It seems to me that the problem with system() is that it's relying on the user's shell to have sane behavior. And while that obviously will work fine in a whole lot of cases, I don't see why we should rely on it. I don't think your shell must support -m with POSIX-like semantics is really a requirement we want to impose on PostgreSQL users. The shell can't make any system calls that we don't have access to ourselves, and setsid() seems like the right one to use. Maybe it's begging for trouble to change anything here at all, but I think if we are going to make a change it ought to be in the direction of making us less dependent on the vagaries of the user's shell, not more. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On Wed, Apr 16, 2014 at 12:28 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 12:20:01 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 12:04:26 -0400, Robert Haas wrote: And... so what's the problem? You seemed to be saying that the background worker would need to a more developed error-handling environment in order to do proper logging, but here you're saying (rightly, I believe) that it doesn't. Even if it did, though, I think the right solution is to install one, not make it the postmaster's job to try to read the tea leaves in the worker's exit code. Well, currently it will log the message that has been thrown, that might lack context. LogChildExit() already has code to print activity of the bgworker after it crashed. I'm still not seeing the problem. It's the background worker's job to make sure that the right stuff gets logged, just as it would be for any other backend. Trying to bolt some portion of the responsibility for that onto the postmaster is 100% wrong. Well, it already has taken on that responsibility, it's not my idea to add it. I merely want to control more precisely what happens.s I think that's doubling down on an already-questionable design principle. Or if I may be permitted a more colloquial idiom: Luke, it's a trap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster
Robert Haas robertmh...@gmail.com writes: On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alternatively, we could do what the comments in pg_ctl have long thought desirable, namely get rid of use of system() in favor of fork()/exec(). With that, pg_ctl could do a setsid() inside the child process. I like this last approach. Me too, but it looked like a less-than-trivial bit of work to me (else I might just have gone and done it). Are you volunteering? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Ctrl+C from sh can shut down daemonized PostgreSQL cluster
On Wed, Apr 16, 2014 at 12:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Feb 17, 2014 at 8:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alternatively, we could do what the comments in pg_ctl have long thought desirable, namely get rid of use of system() in favor of fork()/exec(). With that, pg_ctl could do a setsid() inside the child process. I like this last approach. Me too, but it looked like a less-than-trivial bit of work to me (else I might just have gone and done it). Are you volunteering? I don't have time right at the moment, but maybe at some point, if nobody else gets to it first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Tracking replication slot blockings
I'm thinking it could be interesting to know how many times (or in some other useful unit than times - how often) a specific replication slot has blocked xlog rotation. Since this AFAIK only happens during checkpoints, it seems it should be reasonably cheap to track? It would serve as an indicator of which slave(s) are having enough trouble keeping up to potentially cause issues. Not having looked at that code at all yet, would this be something that's simple to add? Or is it a silly idea? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] test failure on latest source
On 16/04/2014 17:40, Tom Lane wrote: The bigger picture though is that this code isn't failing on the buildfarm. So what we need to ask is what's different about Marco's machine. good question. I checked again and I found that the fault is only on the cygwin 64 bit build but not on the cygwin 32 bit one. I was sure to have tested both, but it seems this time I missed the comparison. regards, tom lane Regards Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking replication slot blockings
Hi, On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote: I'm thinking it could be interesting to know how many times (or in some other useful unit than times - how often) a specific replication slot has blocked xlog rotation. Since this AFAIK only happens during checkpoints, it seems it should be reasonably cheap to track? It would serve as an indicator of which slave(s) are having enough trouble keeping up to potentially cause issues. The xlog removal code just check the global minimum required LSN - it doesn't check the individual slots. So you'd need to add a bit more code to that location. But it'd be easy. But I think I'd just monitor/graph the byte difference for all slots using pg_replication_slots... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking replication slot blockings
On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund and...@2ndquadrant.comwrote: Hi, On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote: I'm thinking it could be interesting to know how many times (or in some other useful unit than times - how often) a specific replication slot has blocked xlog rotation. Since this AFAIK only happens during checkpoints, it seems it should be reasonably cheap to track? It would serve as an indicator of which slave(s) are having enough trouble keeping up to potentially cause issues. The xlog removal code just check the global minimum required LSN - it doesn't check the individual slots. So you'd need to add a bit more code to that location. But it'd be easy. Do we have statistics there somewhere - how often that global minimum blocks something? That on it's own might be a start :) But I think I'd just monitor/graph the byte difference for all slots using pg_replication_slots... Yeah, that would work when monitored continously. I was more looking for the view of hey, could this be what happened into a system that did not previously have any monitoring installed and therefor no such history. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] LDAP: bugfix and deprecated OpenLDAP API
On Mon, Oct 21, 2013 at 3:31 PM, Albe Laurenz laurenz.a...@wien.gv.atwrote: Peter Eisentraut wrote: --- 3511,3534 } /* ! * Perform an explicit anonymous bind. ! * This is not necessary in principle, but we want to set a timeout ! * of PGLDAP_TIMEOUT seconds and return 2 if the connection fails. ! * Unfortunately there is no standard conforming way to do that. */ This comment has become a bit confusing. What exactly is nonstandard? Setting a timeout, or returning 2? The code below actually returns 3. I have improved the comment. + #ifdef HAVE_LIBLDAP +/* in OpenLDAP, use the LDAP_OPT_NETWORK_TIMEOUT option */ We don't use HAVE_LIBLDAP anywhere else to mean OpenLDAP. Existing LDAP-related code uses #ifdef WIN32. Changed. + #else There should be a comment here indicating what this #else belongs to (#else /* HAVE_LIBLDAP */, or whatever we end up using). Changed. +/* the nonstandard ldap_connect function performs an anonymous bind */ +if (ldap_connect(ld, time) != LDAP_SUCCESS) +{ +/* error or timeout in ldap_connect */ +free(url); +ldap_unbind(ld); +return 2; +} + #endif here too Changed. Bonus: Write a commit message for your patch. (Consider using git format-patch.) Suggested commit message is included in the patch. Sorry about the huge delay in trying to get around to this one. For the sake of the archives - I looked at the fact that the windows codepath always returns 2, whereas the unix codepath separates at least one case out as a return 3. I can't see a pattern in the windows return codes that would make it possible to do the same though, without explicitly listing different error codes mapping to each of them - so I don't think it's worth doing that. Thus - applied, and backpatched all the way. Thanks! (And I just realized I forgot to credit reviewers in the commit message. My apologies - I blame confusion after havnig to re-setup my windows build environments all over again..) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Tracking replication slot blockings
On 2014-04-16 19:09:09 +0200, Magnus Hagander wrote: On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund and...@2ndquadrant.comwrote: The xlog removal code just check the global minimum required LSN - it doesn't check the individual slots. So you'd need to add a bit more code to that location. But it'd be easy. Do we have statistics there somewhere - how often that global minimum blocks something? That on it's own might be a start :) Nope. Check xlog.c:KeepLogSeg(), it's pretty simple stuff ;). It's the same place where wal_keep_segments is enforced... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 4:22 AM, Peter Geoghegan p...@heroku.com wrote: I don't want to dismiss what you're saying about heating and cooling being unrelated, but I don't find the conclusion that not everything can be hot obvious. Maybe heat should be relative rather than absolute, and maybe that's actually what you meant. There is surely some workload where buffer access actually is perfectly uniform, and what do you do there? What temperature are those buffers? In that case, hotness, or retention priority, should be relative to re-population cost. IE: whether it's likely to still be in the OS cache or not, whether it's dirty or not, etc. It occurs to me that within the prototype patch, even though usage_count is incremented in a vastly slower fashion (in a wall time sense), clock sweep doesn't take advantage of that. I should probably investigate having clock sweep become more aggressive in decrementing in response to realizing that it won't get some buffer's usage_count down to zero on the next revolution either. There are certainly problems with that, but they might be fixable. Within the patch, in order for it to be possible for the usage_count to be incremented in the interim, an average of 1.5 seconds must pass, so if clock sweep were to anticipate another no-set-to-zero revolution, it seems pretty likely that it would be exactly right, or if not then close enough, since it can only really fail to correct for some buffers getting incremented once more in the interim. Conceptually, it would be like multiple logical revolutions were merged into one actual one, sufficient to have the next revolution find a victim buffer. Why use time at all? Why not synchronize usage bumpability to clock sweeps? I'd use a simple bit that the clock sweep clears, and the users set. Only one increase per sweep. Or maybe use a decreasing loop count instead of a bit. In any case, measuring time in terms of clock sweeps sounds like a better proposition. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 4:01 AM, Andres Freund and...@2ndquadrant.com wrote: Aren't you interested in the significance of the patch, and the test case? Not particularly in the specifics to be honest. The tradeoffs of the techniques you used in there seem prohibitive to me. It's easy to make individual cases faster by sacrificing others. You're the one poring over the specifics of what I've done, to my consternation. I am not prepared to defend the patch at that level, as I've made abundantly clear. I've called it a sketch, a proof of concept half a dozen times already. I don't understand your difficulty with that. I also don't understand how you can be so dismissive of the benchmark, given the numbers involved. You're being unreasonable. If I didn't write this patch, and I talked to people about this issue at pgCon, I'm not sure that anyone would be convinced that it was a problem, or at least that it was this much of a problem. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade tablespaces
On Sun, Jan 12, 2014 at 11:04:41PM -0500, Bruce Momjian wrote: In the pgsql_old installation you have symlinks pointing back to the current default location. As well pg_tablespace points back to /usr/local/pgsql/data/ The issue is that there is not actually anything there in the way of a tablespace. So when pg_upgrade runs it tries to upgrade from /usr/local/pgsql/data/tblspc_dir to /usr/local/pgsql/data/tblspc_dir where the first directory either does not exist. or if the user went ahead and created the directory in the new installation, is empty. What is really wanted is to upgrade from /usr/local/pgsql_old/data/tblspc_dir to /usr/local/pgsql/data/tblspc_dir. Right now the only way that happens is with user intervention. Right, it points to _nothing_ in the _new_ cluster. Perhaps the simplest approach would be to check all the pg_tablespace locations to see if they point at real directories. If not, we would have to have the user update pg_tablespace and the symlinks. :-( Actually, even in 9.2+, those symlinks are going to point at the same nothing. That would support checking the symlinks in all versions. I have developed the attached patch which checks all tablespaces to make sure the directories exist. I plan to backpatch this. The reason we haven't seen this bug reported more frequently is that a _database_ defined in a non-existent tablespace directory already throws an backend error, so this check is only necessary where tables/indexes (not databases) are defined in non-existant tablespace directories. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/contrib/pg_upgrade/tablespace.c b/contrib/pg_upgrade/tablespace.c new file mode 100644 index 783ee93..5a3dc60 *** a/contrib/pg_upgrade/tablespace.c --- b/contrib/pg_upgrade/tablespace.c *** *** 11,16 --- 11,18 #include pg_upgrade.h + #include sys/types.h + static void get_tablespace_paths(void); static void set_tablespace_directory_suffix(ClusterInfo *cluster); *** get_tablespace_paths(void) *** 65,73 --- 67,101 i_spclocation = PQfnumber(res, spclocation); for (tblnum = 0; tblnum os_info.num_old_tablespaces; tblnum++) + { + struct stat statBuf; + os_info.old_tablespaces[tblnum] = pg_strdup( PQgetvalue(res, tblnum, i_spclocation)); + /* + * Check that the tablespace path exists and is a directory. + * Effectively, this is checking only for tables/indexes in + * non-existant tablespace directories. Databases located in + * non-existant tablespaces already throw a backend error. + */ + if (stat(os_info.old_tablespaces[tblnum], statBuf) != 0) + { + if (errno == ENOENT) + report_status(PG_FATAL, + tablespace directory \%s\ does not exist\n, + os_info.old_tablespaces[tblnum]); + else + report_status(PG_FATAL, + cannot stat() tablespace directory \%s\: %s\n, + os_info.old_tablespaces[tblnum], getErrorText(errno)); + } + if (!S_ISDIR(statBuf.st_mode)) + report_status(PG_FATAL, + tablespace path \%s\ is not a directory\n, + os_info.old_tablespaces[tblnum]); + } + PQclear(res); PQfinish(conn); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 1:42 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 16, 2014 at 4:01 AM, Andres Freund and...@2ndquadrant.com wrote: Aren't you interested in the significance of the patch, and the test case? Not particularly in the specifics to be honest. The tradeoffs of the techniques you used in there seem prohibitive to me. It's easy to make individual cases faster by sacrificing others. You're the one poring over the specifics of what I've done, to my consternation. I am not prepared to defend the patch at that level, as I've made abundantly clear. I've called it a sketch, a proof of concept half a dozen times already. I don't understand your difficulty with that. I also don't understand how you can be so dismissive of the benchmark, given the numbers involved. You're being unreasonable. I don't think he's being unreasonable, and I don't understand why you're getting bent out of shape about it. You proposed a patch, he articulated a problem, you don't want to fix it right now. All of which is fine. Why the ad hominem accusations? If I didn't write this patch, and I talked to people about this issue at pgCon, I'm not sure that anyone would be convinced that it was a problem, or at least that it was this much of a problem. I agree with that, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Telling users to monitor a setting using a restricted-permission command-line utility which produces a version-specific text file they have to parse is not going to win us a lot of fans. Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... pg_class.relminmxid is the oldest multixact value that might be present in a table. On every database I've tested, age(relminmxid) returns int_max. So this is apparently broken. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote: I don't think he's being unreasonable, and I don't understand why you're getting bent out of shape about it. You proposed a patch, he articulated a problem, you don't want to fix it right now. All of which is fine. Why the ad hominem accusations? I just think it's bad form to hold something like this to the same standards as a formal commitfest submission. I am well aware that the patch probably has several scalability issues. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On 03/12/2014 09:45 AM, Heikki Linnakangas wrote: In hindsight, I think permanent multixids in their current form was a mistake. Before 9.3, the thing that made multixids special was that they could just be thrown away at a restart. They didn't need freezing. Now that they do, why not just use regular XIDs for them? We had to duplicate much of the wraparound and freezing logic for multixids that simply would not have been an issue if we had used regular XIDs instead. We could've perhaps kept the old multixids for their original purpose, as transient xids that can be forgotten about after all the old snapshots are gone. But for the permanent ones, it would've been simpler if we handled them more like subxids; make them part of the same XID space as regular XIDs. This is pretty hand-wavy of course, and it's too late now. So, if we ripped out all the multixact stuff for 9.4, what would that cost us? I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Seems like this was just a bad patch and we should rip it out. What features do we lose? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On 2014-04-16 11:10:52 -0700, Josh Berkus wrote: On 03/12/2014 09:45 AM, Heikki Linnakangas wrote: In hindsight, I think permanent multixids in their current form was a mistake. Before 9.3, the thing that made multixids special was that they could just be thrown away at a restart. They didn't need freezing. Now that they do, why not just use regular XIDs for them? We had to duplicate much of the wraparound and freezing logic for multixids that simply would not have been an issue if we had used regular XIDs instead. We could've perhaps kept the old multixids for their original purpose, as transient xids that can be forgotten about after all the old snapshots are gone. But for the permanent ones, it would've been simpler if we handled them more like subxids; make them part of the same XID space as regular XIDs. This is pretty hand-wavy of course, and it's too late now. So, if we ripped out all the multixact stuff for 9.4, what would that cost us? Ripping multixacts out in general? Err, right. We'd loose shared row level locks... I think ripping out stuff at this point would be the cause of many, many more bugs than it'd prevent. I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Sorry, but I think you're blowing some GUCs *WAY* out of proportion. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On 04/16/2014 11:22 AM, Andres Freund wrote: I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Sorry, but I think you're blowing some GUCs *WAY* out of proportion. I'm not talking about the GUCs. I'm talking about the data corruption bugs. Including the new one this week. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On 2014-04-16 11:25:49 -0700, Josh Berkus wrote: On 04/16/2014 11:22 AM, Andres Freund wrote: I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Sorry, but I think you're blowing some GUCs *WAY* out of proportion. I'm not talking about the GUCs. That was about: We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. I'm talking about the data corruption bugs. That was covered by at this point ripping this out seems likely to cause many more bugs than it would solve. Including the new one this week. Lets hold our horses a bit, we don't know what's happening there for now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 1:07 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote: I don't think he's being unreasonable, and I don't understand why you're getting bent out of shape about it. You proposed a patch, he articulated a problem, you don't want to fix it right now. All of which is fine. Why the ad hominem accusations? I just think it's bad form to hold something like this to the same standards as a formal commitfest submission. I am well aware that the patch probably has several scalability issues. In fairness to Andres, while *you* may know that issuing an expensive syscall in a tight loop is on the list of Forbidden Things, a lot of people don't and it's pretty reasonable to issue methodology objections in order to get them documented. Anyways, I'm still curious if you can post similar numbers basing the throttling on gross allocation counts instead of time. Meaning: some number of buffer allocations has to have occurred before you consider eviction. Besides being faster I think it's a better implementation: an intermittently loaded server will give more consistent behavior. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Merlin Moncure mmonc...@gmail.com writes: Anyways, I'm still curious if you can post similar numbers basing the throttling on gross allocation counts instead of time. Meaning: some number of buffer allocations has to have occurred before you consider eviction. Besides being faster I think it's a better implementation: an intermittently loaded server will give more consistent behavior. Yeah --- I think wall-clock-based throttling is fundamentally the wrong thing anyway. Are we going to start needing a CPU speed measurement to tune the algorithm with? Not the place to be going. But driving it off the number of allocations that've been done could be sensible. (OTOH, that means you need a central counter, which itself would be a bottleneck.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case against multixact GUCs
On 04/16/2014 11:30 AM, Andres Freund wrote: On 2014-04-16 11:25:49 -0700, Josh Berkus wrote: On 04/16/2014 11:22 AM, Andres Freund wrote: I'm serious. The multixact stuff has been broken since 9.3 was released, and it's *still* broken. We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. Sorry, but I think you're blowing some GUCs *WAY* out of proportion. I'm not talking about the GUCs. That was about: We can't give users any guidance or tools on how to set multixact stuff, and autovacuum doesn't handle it properly. OK. I will point out that if multixact freeze was an *intentional* feature, we'd never have accepted it given the total lack of either documentation or monitorability. I'm talking about the data corruption bugs. That was covered by at this point ripping this out seems likely to cause many more bugs than it would solve. That's certainly possible. I just don't think the option of reversing those patches should be off the table. Things have been bad enough that that might be the best option. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 2:07 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Apr 16, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote: I don't think he's being unreasonable, and I don't understand why you're getting bent out of shape about it. You proposed a patch, he articulated a problem, you don't want to fix it right now. All of which is fine. Why the ad hominem accusations? I just think it's bad form to hold something like this to the same standards as a formal commitfest submission. I am well aware that the patch probably has several scalability issues. I don't agree. I think it's perfectly appropriate to raise potential issues at the earliest possible time. People have regularly been heard to complain in this forum that those objecting to a patch did not object soon enough for them to make changes. That's a hard problem to fix because we can't force people whose salaries we're not paying to attention to patches over whatever else they may have to do, but we shouldn't label it as a bad thing when people choose to get involved and provide feedback early. Early feedback is exactly what we want to encourage here. And regardless of any of that, I think person X is being unreasonable is a personal attack that has exactly zero place on this mailing list. We are here to talk about technology, not anyone's character. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: Anyways, I'm still curious if you can post similar numbers basing the throttling on gross allocation counts instead of time. Meaning: some number of buffer allocations has to have occurred before you consider eviction. Besides being faster I think it's a better implementation: an intermittently loaded server will give more consistent behavior. Yeah --- I think wall-clock-based throttling is fundamentally the wrong thing anyway. Are we going to start needing a CPU speed measurement to tune the algorithm with? Not the place to be going. But driving it off the number of allocations that've been done could be sensible. (OTOH, that means you need a central counter, which itself would be a bottleneck.) sure -- note we already track that in BufferStrategyControl (everything in buffer allocation is already centrally managed essentially). /* * Statistics. These counters should be wide enough that they can't * overflow during a single bgwriter cycle. */ uint32 completePasses; /* Complete cycles of the clock sweep */ uint32 numBufferAllocs;/* Buffers allocated since last reset */ merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
On Wed, Apr 16, 2014 at 12:17 PM, Robert Haas robertmh...@gmail.com wrote: I don't agree. I think it's perfectly appropriate to raise potential issues at the earliest possible time. If I didn't *strongly* emphasize my intent in writing the patch up front, I'd certainly agree. I just don't see why what I've done cannot be accepted in the spirit in which it was intended. And regardless of any of that, I think person X is being unreasonable is a personal attack that has exactly zero place on this mailing list. We are here to talk about technology, not anyone's character. Telling someone they're being unreasonable is not a personal attack. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?
Merlin Moncure mmonc...@gmail.com writes: On Wed, Apr 16, 2014 at 1:44 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah --- I think wall-clock-based throttling is fundamentally the wrong thing anyway. Are we going to start needing a CPU speed measurement to tune the algorithm with? Not the place to be going. But driving it off the number of allocations that've been done could be sensible. (OTOH, that means you need a central counter, which itself would be a bottleneck.) sure -- note we already track that in BufferStrategyControl (everything in buffer allocation is already centrally managed essentially). Indeed, but I'd think getting rid of that property would be one of the top priorities for any attempt to do anything at all in this area of the code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Yeah, good idea. Want to propose a patch? Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... pg_class.relminmxid is the oldest multixact value that might be present in a table. On every database I've tested, age(relminmxid) returns int_max. So this is apparently broken. Hmm, are you sure it's INT_MAX and not 4244967297? Heikki reported that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com The absolute value is not important; I think that's mostly harmless. I don't think applying age() to a multixact value is meaningful, though; that's only good for Xids. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
On 04/16/2014 01:30 PM, Alvaro Herrera wrote: Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Yeah, good idea. Want to propose a patch? Yeah, lemme dig into this. I really think we need it for 9.4, feature frozen or not. Also: how do I check the multixact age of a table? There doesn't seem to be any data for this ... pg_class.relminmxid is the oldest multixact value that might be present in a table. On every database I've tested, age(relminmxid) returns int_max. So this is apparently broken. Hmm, are you sure it's INT_MAX and not 4244967297? Heikki reported that: http://www.postgresql.org/message-id/52401aea.9000...@vmware.com The absolute value is not important; I think that's mostly harmless. I don't think applying age() to a multixact value is meaningful, though; that's only good for Xids. Yeah, I'm sure: josh=# select relname, age(relminmxid) from pg_class; relname |age -+ pg_statistic| 2147483647 pg_type | 2147483647 random | 2147483647 dblink_pkey_results | 2147483647 pg_toast_17395 | 2147483647 ... So if age() doesn't mean anything, then how are users to know when the need to freeze? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
Josh Berkus wrote: Josh Berkus wrote: You can see the current multixact value in pg_controldata output. Keep timestamped values of that somewhere (a table?) so that you can measure consumption rate. I don't think we provide SQL-level access to those values. Bleh. Do we provide SQL-level access in 9.4? If not, I think that's a requirement before release. Yeah, good idea. Want to propose a patch? Yeah, lemme dig into this. I really think we need it for 9.4, feature frozen or not. Great, thanks. josh=# select relname, age(relminmxid) from pg_class; relname |age -+ pg_statistic| 2147483647 pg_type | 2147483647 random | 2147483647 dblink_pkey_results | 2147483647 pg_toast_17395 | 2147483647 ... So if age() doesn't mean anything, then how are users to know when the need to freeze? I don't understand. Autovacuum will freeze this automatically when the threshold is reached. Users don't need to do anything. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bgworker crashed or not?
On 16/04/14 18:34, Robert Haas wrote: On Wed, Apr 16, 2014 at 12:28 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-04-16 12:20:01 -0400, Robert Haas wrote: On Wed, Apr 16, 2014 at 12:11 PM, Andres Freund and...@2ndquadrant.com wrote: I'm still not seeing the problem. It's the background worker's job to make sure that the right stuff gets logged, just as it would be for any other backend. Trying to bolt some portion of the responsibility for that onto the postmaster is 100% wrong. Well, it already has taken on that responsibility, it's not my idea to add it. I merely want to control more precisely what happens.s I think that's doubling down on an already-questionable design principle. Or if I may be permitted a more colloquial idiom: Luke, it's a trap. Well the logging is just too spammy in general when it comes to dynamic bgworkers but that's easy to fix in the future, no need to make decisions for 9.4. However I really don't like that I have to exit with exit code 1, which is normally used as failure, if I want to shutdown my dynamic bgworker once it has finished the work. And this behavior is something we can set properly only once... -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New functions for sslinfo extension
Hello all, postgresmen! I want to present some functions to sslinfo extension module:1) ssl_get_count_of_extensions() --- get count of X509v3 extensions from client certificate;2) ssl_get_extension_names() --- get short names of X509v3 extensions from client certificate;3) ssl_get_extension_value(text) --- get value of extension from certificate (argument --- short name of extension);4) ssl_is_critical_extension(text) --- returns true, if extension is critical and false, if is not (argument --- short name of extension). I write those functions with libpq on C. I put code of module and sql-file for loading with this letter. Best regards,Dmitry Voronin #include postgres.h #include fmgr.h #include utils/numeric.h #include libpq/libpq-be.h #include miscadmin.h #include utils/builtins.h #include mb/pg_wchar.h #include funcapi.h #include openssl/x509.h #include openssl/x509v3.h PG_MODULE_MAGIC; X509_EXTENSION *get_extension(X509* certificate, char *name); Datum ssl_get_extension_value(PG_FUNCTION_ARGS); Datum ssl_is_critical_extension(PG_FUNCTION_ARGS); Datum ssl_get_count_of_extensions(PG_FUNCTION_ARGS); Datum ssl_get_extension_names(PG_FUNCTION_ARGS); X509_EXTENSION *get_extension(X509* certificate, char *name) { int extension_nid = 0; int locate = 0; extension_nid = OBJ_sn2nid(name); if (0 == extension_nid) { extension_nid = OBJ_ln2nid(name); if (0 == extension_nid) return NULL; } locate = X509_get_ext_by_NID(certificate, extension_nid, -1); return X509_get_ext(certificate, locate); } PG_FUNCTION_INFO_V1(ssl_get_extension_value); Datum ssl_get_extension_value(PG_FUNCTION_ARGS) { X509 *certificate = MyProcPort - peer; X509_EXTENSION *extension = NULL; char *extension_name = text_to_cstring(PG_GETARG_TEXT_P(0)); BIO *bio = NULL; char *value = NULL; text *result = NULL; if (NULL == certificate) PG_RETURN_NULL(); extension = get_extension(certificate, extension_name); if (NULL == extension) elog(ERROR, Extension by name \%s\ is not found in certificate, extension_name); bio = BIO_new(BIO_s_mem()); char nullterm = '\0'; X509V3_EXT_print(bio, extension, -1, -1); BIO_write(bio, nullterm, 1); BIO_get_mem_data(bio, value); result = cstring_to_text(value); BIO_free(bio); PG_RETURN_TEXT_P(result); } PG_FUNCTION_INFO_V1(ssl_is_critical_extension); Datum ssl_is_critical_extension(PG_FUNCTION_ARGS) { X509 *certificate = MyProcPort - peer; X509_EXTENSION *extension = NULL; char *extension_name = text_to_cstring(PG_GETARG_TEXT_P(0)); int critical = 0; if (NULL == certificate) PG_RETURN_NULL(); extension = get_extension(certificate, extension_name); if (NULL == extension) elog(ERROR, Extension name \%s\ is not found in certificate, extension_name); critical = X509_EXTENSION_get_critical(extension); PG_RETURN_BOOL(critical); } PG_FUNCTION_INFO_V1(ssl_get_count_of_extensions); Datum ssl_get_count_of_extensions(PG_FUNCTION_ARGS) { X509 *certificate = MyProcPort - peer; if (NULL == certificate) PG_RETURN_NULL(); PG_RETURN_INT32(X509_get_ext_count(certificate)); } PG_FUNCTION_INFO_V1(ssl_get_extension_names); Datum ssl_get_extension_names(PG_FUNCTION_ARGS) { X509*certificate = MyProcPort - peer; FuncCallContext *funcctx; STACK_OF(X509_EXTENSION) *extension_stack = NULL; MemoryContext oldcontext; int call = 0; int max_calls = 0; X509_EXTENSION *extension = NULL; ASN1_OBJECT *object = NULL; int extension_nid = 0; text*result = NULL; if (NULL == certificate) PG_RETURN_NULL(); extension_stack = certificate - cert_info - extensions; if (NULL == extension_stack) PG_RETURN_NULL(); if (SRF_IS_FIRSTCALL()) { funcctx = SRF_FIRSTCALL_INIT(); oldcontext = MemoryContextSwitchTo(funcctx - multi_call_memory_ctx); funcctx - max_calls = X509_get_ext_count(certificate); MemoryContextSwitchTo(oldcontext); } funcctx = SRF_PERCALL_SETUP(); call = funcctx - call_cntr; max_calls = funcctx - max_calls; if (call max_calls) { extension = sk_X509_EXTENSION_value(extension_stack, call); object = X509_EXTENSION_get_object(extension); extension_nid = OBJ_obj2nid(object); if (0 == extension_nid) elog(ERROR, Unknown extension in certificate); result = cstring_to_text(OBJ_nid2sn(extension_nid)); SRF_RETURN_NEXT(funcctx, (Datum) result); } sk_X509_EXTENSION_free(extension); SRF_RETURN_DONE(funcctx); } sslextensions.control Description: Binary data sslextensions--1.0.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bogus tsdict, tsparser, etc object identities
Alvaro Herrera wrote: This problem is new in 9.3, so backpatching to that. Prior to that we didn't have object identities. Done. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Need Multixact Freezing Docs
So if age() doesn't mean anything, then how are users to know when the need to freeze? I don't understand. Autovacuum will freeze this automatically when the threshold is reached. Users don't need to do anything. What I'm asking is: - how do users know if Autovacuum is keeping up with multixact feezing? - how do users get data on multixact usage so that they can tune the parameters? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] assertion failure 9.3.4
So, from top to bottom I see the following elements: * backend is executing a query * this query is getting captured by pg_stat_statements * the query is also getting captured by autoexplain, in chain from pg_stat_statements * autoexplain runs the query, which invokes a plpgsql function * this plpgsql function runs another query, and this one is captured by pg_stat_statements * and also by autoexplain * The autoexplain run of this inner query invokes a trigger * .. which is a FK trigger, and therefore runs ri_PerformCheck which runs another query * This FK check query gets captured by pg_stat_statements * and also by autoexplain, which runs it * this autoexplain run of the third query invokes SeqNext * this does a heap access, which uses HeapTupleSatisfiesMVCC * this one tries to read the update XID and gets an assertion failure. Oh my. Would it be possible for you to provide a self-contained setup that behaves like this? I think a bt full would be useful since it'd provide the queries at each of the three stages. I'm not quite clear on why the third query, the one in ri_PerformCheck, is invoking a sequence. AFAICS it's this: /* -- * The query string built is * SELECT 1 FROM ONLY pktable x WHERE pkatt1 = $1 [AND ...] * FOR KEY SHARE OF x * The type id's for the $ parameters are those of the * corresponding FK attributes. * -- */ so either I'm misreading the whole thing, or there is something very odd going on here. Are you aware of something unusual in the FK setups? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improve the help message about psql -F
On Tue, Feb 11, 2014 at 10:02:20PM -0500, Peter Eisentraut wrote: If you are going to change the help string for -F, you should also update the help string for -R, and possibly for -z and -0. Patch applied with all the suggestions merged in; commitfest item marked as committed: -F, --field-separator=STRING field separator for unaligned output (default: |) -H, --html HTML table output mode -P, --pset=VAR[=ARG] set printing option VAR to ARG (see \pset command) -R, --record-separator=STRING record separator for unaligned output (default: newline) -t, --tuples-onlyprint rows only -T, --table-attr=TEXTset HTML table tag attributes (e.g., width, border) -x, --expanded turn on expanded table output -z, --field-separator-zero set field separator for unaligned output to zero byte -0, --record-separator-zero set record separator for unaligned output to zero byte -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_get_viewdefs() indentation considered harmful
On Sat, Jan 25, 2014 at 01:02:36PM -0500, Andrew Dunstan wrote: On 01/25/2014 11:06 AM, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jan 24, 2014 at 8:53 PM, Greg Stark st...@mit.edu wrote: Indeed even aside from the performance questions, once you're indented 5-10 times the indention stops being useful at all. The query would probably be even more readable if we just made indentation modulo 40 so once you get too far indented it wraps around which is not unlike how humans actually indent things in this case. Ha! That seems a little crazy, but *capping* the indentation at some reasonable value might not be dumb. I could go for either of those approaches, if applied uniformly, and actually Greg's suggestion sounds a bit better: it seems more likely to preserve some readability in deeply nested constructs. With either approach you need to ask where the limit value is going to come from. Is it OK to just hard-wire a magic number, or do we need to expose a knob somewhere? Simply capping it is probably the best bang for the buck. I suspect most people would prefer to have q1 union q2 union q3 union q4 with the subqueries all indented to the same level. But I understand the difficulties in doing so. A knob seems like overkill. I'd just hardwire some number, say three or four levels of indentation. Did we address this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] shouldn't we log permission errors when accessing the configured trigger file?
On Mon, Jan 27, 2014 at 03:45:38PM +0100, Magnus Hagander wrote: On Mon, Jan 27, 2014 at 3:43 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jan 26, 2014 at 1:03 PM, Andres Freund and...@2ndquadrant.com wrote: For some reason CheckForStandbyTrigger() doesn't report permission errors when stat()int the trigger file. Shouldn't we fix that? static bool CheckForStandbyTrigger(void) { ... if (stat(TriggerFile, stat_buf) == 0) { ereport(LOG, (errmsg(trigger file found: %s, TriggerFile))); unlink(TriggerFile); triggered = true; fast_promote = true; return true; } Imo the stat() should warn about all errors but ENOENT? Seems reasonable. It could lead to quite a bit of log spam, I suppose, but the way things are now could be pretty mystifying if you've located your trigger file somewhere outside $PGDATA, and a parent directory is lacking permissions. +1. Since it actually indicates something that's quite broken (since with that you can never make the trigger work until you fix it), the log spam seems like it would be appropriate. (Logspam is never nice, but a single log line is also very easy to miss - this should log enough that you wouldn't) I have developed the attached patch to address this issue. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c new file mode 100644 index 0106cdf..88ad51f *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** CheckForStandbyTrigger(void) *** 11102,11107 --- 11102,3 fast_promote = true; return true; } + else if (errno != ENOENT) + ereport(ERROR, + (errcode_for_file_access(), + errmsg(could not stat trigger file \%s\: %m, + TriggerFile))); + return false; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity
On Tue, Apr 15, 2014 at 12:32:36PM -0700, David Fetter wrote: On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote: On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote: Bruce Momjian br...@momjian.us writes: psql: conditionally display oids and replication identity Buildfarm isn't terribly pleased with this --- looks like you missed contrib/test_decoding/ Fixed. I added a personal script option that allows me to test contrib, but forgot to run it. Is that script of general utility for committers? If so, it might be good to include it in the distribution. I'd be happy to go through and perl-ify it, document it, etc. Or maybe it could be a new make target... My script is wrapper around src/tools/pgtest. I am attaching it in case there are snippets in there that are useful to people. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + : . traprm [ $1 = -c ] CONTRIB=Y shift [ $1 = -d ] DOCS=Y shift [ $1 = -c ] CONTRIB=Y shift export QUIET=$(($QUIET + 1)) . cd_pgtop chown -R postgres . echo Checking SGML cd doc/src/sgml # make postgres.sgml first so we don't filter on a configure check make -q postgres.sgml make check $TMP/0 21 if grep -v 'fully-tagged' $TMP/0 | egrep -qi 'Error|Warning' thenecho SGML error cat $TMP/0 exit 1 fi [ $(pwd) != '/pgsql/8.4/doc/src/sgml' ] make check-tabs # Run only at night to check for HISTORY build problems # in HISTORY.html. if [ ! -t 0 -o $DOCS = Y ] thenmake INSTALL.html $TMP/0 21 if egrep -qi 'Error|Warning' $TMP/0 thenecho SGML error cat $TMP/0 exit 1 fi # removed in PG 9.4 make HISTORY.html $TMP/0 21 if grep -q 'Error' $TMP/0 thenecho SGML error cat $TMP/0 exit 1 fi fi # fails on /bin/sh cd - /dev/null echo Checking duplicate oids cd src/include/catalog duplicate_oids $TMP/0 if [ -s $TMP/0 ] thenecho Duplicate system oids cat $TMP/0 exit 1 fi cd - /dev/null pggit diff --check || exit 1 echo Running pgtest (aspg /pg/tools/pgtest --silent $@; echo $? $TMP/ret) | # use only one grep so we don't buffer output egrep -v 'In file included from gram.y:|warning: unused variable .yyg.|yy_try_NUL_trans' if [ $CONTRIB ] thencd contrib make --silent # install-check is much faster because no initdbs aspg make --silent check cd - /dev/null fi rm -fr src/test/regress/tmp_check [ -t 0 ] bell exit $(cat $TMP/ret) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Thu, Feb 6, 2014 at 09:40:32AM +0100, Andres Freund wrote: On 2014-02-05 12:36:42 -0500, Robert Haas wrote: It may well be that your proposal is spot on. But I'd like to see some data-structure-by-data-structure measurements, rather than assuming that alignment must be a good thing. I am fine with just aligning BufferDescriptors properly. That has clearly shown massive improvements. I thought your previous idea of increasing BUFFERALIGN to 64 bytes had a lot to recommend it. Good. I wonder if we shouldn't move that bit of logic: if (size = BUFSIZ) newStart = BUFFERALIGN(newStart); out of ShmemAlloc() and instead have a ShmemAllocAligned() and ShmemInitStructAligned() that does it. So we can sensibly can control it per struct. But that doesn't mean it doesn't need testing. I feel the need here, to say that I never said it doesn't need testing and never thought it didn't... Where are we on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers