Re: [HACKERS] More flexible LDAP auth search filters?
On 08/09/17 16:33, Peter Eisentraut wrote: > A couple of comments on this patch. I have attached a "fixup" patch on > top of your v4 that should address them. > > - I think the bracketing of the LDAP URL synopsis is wrong. > > - I have dropped the sentence that LDAP URL extensions are not > supported. That sentence was written mainly to point out that filters > are not supported, which they are now. There is nothing beyond filters > and extensions left in LDAP URLs, AFAIK. > > - We have previously used the ldapsearchattribute a bit strangely. We > use it as both the search filter and as the attribute to return from the > search, but we don't actually care about the returned attribute (we only > use the DN (which is not a real attribute)). That hasn't been a real > problem, but now if you specify a filter, as the code stands it will > always request the "uid" attribute, which won't work if there is no such > attribute. I have found that there is an official way to request "no > attribute", so I have fixed the code to do that. > > - I was wondering whether we need a way to escape the % (%%?) but it > doesn't seem worth bothering. > > I have been looking around the Internet how this functionality compares > to other LDAP authentication systems. > > I didn't see the origin of the %u idea in this thread, but perhaps it > came from Dovecot. But there it's a general configuration file syntax, > nothing specific to LDAP. In nginx you use %(username), which again is > general configuration file syntax. I'm OK with using %u. > > The original LDAP URL design was adapted from Apache > (https://httpd.apache.org/docs/2.4/mod/mod_authnz_ldap.html#authldapurl). > They combine the attribute filter and the general filter if both are > specified, but I think they got that a bit wrong. The attribute field > in the URL is actually not supposed to be a filter but a return field, > which is also the confusion mentioned above. > > The PAM module (https://linux.die.net/man/5/pam_ldap) also has a way to > specify a search attribute and a general filter and combines them. > > Neither of these allows substituting the user name into the search filter. Great work! Having installed quite a few LDAP-based authentication setups in the past, I can promise you that pam_ldap is the last tool I would consider for the job so please don't hold to this as being the gold standard(!). My weapon of choice for LDAP deployments on POSIX-based systems is Arthur De Jong's nss-pam-ldapd (https://arthurdejong.org/nss-pam-ldapd) which is far more flexible than pam_ldap and fixes a large number of bugs, including the tendency for pam_ldap to hang infinitely if it can't contact its LDAP server. Take a look at nss-pam-ldapd's man page for nslcd.conf and in particular pam_authz_search - this is exactly the type of filters I would end up deploying onto servers. This happens a lot in large organisations whereby getting group memberships updated in the main directory can take days/weeks whereas someone with root access to the server itself can hard-code an authentication list of users and/or groups in an LDAP filter in just a few minutes. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 01/08/17 23:17, Thomas Munro wrote: > On Wed, Aug 2, 2017 at 5:36 AM, Peter Eisentraut >wrote: >> On 7/16/17 19:09, Thomas Munro wrote: >>> On Mon, Jul 17, 2017 at 10:26 AM, Thomas Munro >>> wrote: ldap-search-filters-v2.patch >>> >>> Gah, it would help if I could spell "occurrences" correctly. Fixed in >>> the attached. >> >> Please also add the corresponding support for specifying search filters >> in LDAP URLs. See RFC 4516 for the format and >> https://linux.die.net/man/3/ldap_url_parse for the API. You might just >> need to grab LDAPURLDesc.lud_filter and use it. > > Good idea. Yes, it seems to be that simple. Here's a version like > that. Here's an example of how it looks in pg_hba.conf: > > host all all 127.0.0.1/32ldap > ldapurl="ldap://localhost/ou=people1,dc=my-domain,dc=com??sub?(cn=%25u)" > > Maybe we could choose a better token than %u for user name, since it > has to be escaped when included in a URL like that, but on the other > hand there seems to be wide precedent for %u in other software. Yeah, mostly I only ever see ldapurls used programatically, i.e. the configuration allows you to set the various fields separately and then the software generates the URL with the correct encoding itself. But if it's documented that's not a reason to reject the patch as I definitely see it as an improvement. As I mentioned previously in the thread, the main barrier preventing people from using LDAP is that the role cannot be generated from other attributes in the directory. In a lot of real-life cases I see, that would be enough to discount PostgreSQL's LDAP authentication completely. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 17/07/17 18:08, Magnus Hagander wrote: > On Mon, Jul 17, 2017 at 6:47 PM, Mark Cave-Ayland > <mark.cave-ayl...@ilande.co.uk <mailto:mark.cave-ayl...@ilande.co.uk>> > wrote: > Great to hear from you! It has definitely been a while... > > Indeed. You should spend more time on these lists :P Well I do get the emails, unfortunately it's trying to find the time to read them all ;) > > How would that actually work, though? Given the same user in ldap could > > now potentially match multiple different users in PostgreSQL. Would you > > then create two accounts for the user, one with the uid as name and one > > with email as name? Wouldn't that actually cause more issues than it > solves? > > Normally what happens for search+bind is that you execute the custom > filter with substitutions in order to find the entry for your bind DN, > but you also request the value of ldapsearchattribute (or equivalent) at > the same time. Say for example you had an entry like this: > > dn: uid=mha,dc=users,dc=hagander,dc=net > uid: mha > mail: mag...@hagander.net <mailto:mag...@hagander.net> > > Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN > "uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or > email address. > > If the bind is successful then the current user identity should be set > to the value of the ldapsearchattribute retrieved from the bind DN > entry, which with the default of "uid" would be "mha". Hence you end up > with the same user role "mha" regardless of whether a uid or email > address was entered. > > > Right. This is the part that doesn't work for PostgreSQL. Because we > have already specified the username -- it goes in the startup packet in > order to pick the correct row from pg_hba.conf. I don't think that's necessarily going to be an issue here because if you're specifying a custom LDAP filter then there's a very good chance that you're delegating access control to information held in the directory anyway, e.g. (&(memberOf=cn=pgusers,dc=groups,dc=hagander,dc=net)(uid=%u)) ((&(uid=%u)(|(uid=mha)(uid=mark)(uid=thomas))) In the mail example above when you're using more than one attribute, I think it's fair enough to simply say in the documentation you must set user to "all" in pg_hba.conf since it is impossible to know which attribute is being used to identify the user role until after authentication. I should add that personally I don't recommend such setups where the user can log in using more than one identifier, but there are clients out there who absolutely will insist on it (think internal vs. external users) so if the LDAP support is being updated then it's worth exploring to see if these cases can be supported. > I guess in theory we could treat it like Kerberos or another one of the > systems where we get the username from an external entity. But then > you'd still have to specify the mapping again in pg_ident.conf, and it > would be a pretty strong break of backwards compatibility. (goes and glances at the code) Is there no reason why you couldn't just overwrite port->user_name based upon ldapsearchattribute at the end of CheckLDAPAuth? And if this were only enabled when a custom filter were specified then it shouldn't cause any backwards compatibility issues being a new feature? > In terms of matching multiple users, all LDAP authentication routines > I've seen will fail if more than one DN matches the search filter, so > this nicely handles the cases where either the custom filter is > incorrect or more than one entry is accidentally matched in the > directory. > > So do we, in the current implementation. But it's a lot less likely to > happen in the current implementation, since we do a single equals lookup. Great, that's absolutely fine :) ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 17/07/17 13:09, Magnus Hagander wrote: Hi Magnus, Great to hear from you! It has definitely been a while... > Generally you find that you will be given the option to set the > attribute for the default search filter of the form > "(attribute=username)" which defaults to uid for UNIX-type systems and > sAMAccountName for AD. However there is always the ability to specify a > custom filter where the user is substituted via e.g. %u to cover all the > other use-cases. > > Right, but that's something we do already today. It just defaults to > uid, but it's easy to change. Yes, I think that bit is fine as long as the default can be overridden. There's always a choice as to whether the defaults favour a POSIX-based LDAP or AD environment but that happens with all installations so it's not a big deal. > As an example, I don't know if anyone would actually do this with > PostgreSQL but I've been asked on multiple occasions to configure > software so that users should be allowed to log in with either their > email address or username which is easily done with a custom LDAP filter > like "(|(mail=%u)(uid=%u))". > > > How would that actually work, though? Given the same user in ldap could > now potentially match multiple different users in PostgreSQL. Would you > then create two accounts for the user, one with the uid as name and one > with email as name? Wouldn't that actually cause more issues than it solves? Normally what happens for search+bind is that you execute the custom filter with substitutions in order to find the entry for your bind DN, but you also request the value of ldapsearchattribute (or equivalent) at the same time. Say for example you had an entry like this: dn: uid=mha,dc=users,dc=hagander,dc=net uid: mha mail: mag...@hagander.net Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN "uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or email address. If the bind is successful then the current user identity should be set to the value of the ldapsearchattribute retrieved from the bind DN entry, which with the default of "uid" would be "mha". Hence you end up with the same user role "mha" regardless of whether a uid or email address was entered. In terms of matching multiple users, all LDAP authentication routines I've seen will fail if more than one DN matches the search filter, so this nicely handles the cases where either the custom filter is incorrect or more than one entry is accidentally matched in the directory. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 17/07/17 00:14, Stephen Frost wrote: >> If it helps, we normally recommend that clients use ldaps for both AD >> and UNIX environments, although this can be trickier from an >> administrative perspective in AD environments because it can require >> changes to the Windows firewall and certificate installation. > > LDAPS is better than straight LDAP, of course, but it still doesn't > address the issue that the password is sent to the server, which both > SCRAM and Kerberos do and is why AD environments use Kerberos for > authentication, and why everything in an AD environment also should use > Kerberos. > > Using Kerberos should also avoid the need to hack the Windows firewall > or deal with certificate installation. In an AD environment, it's > actually pretty straight-forward to add a PG server too. Further, in my > experience at least, there's been other changes recommended by Microsoft > that prevent using LDAP for auth because it's insecure. Oh sure - I'm not questioning that Kerberos is a far better choice in pure AD environments, it's just that I spend the majority of my time in mixed-mode environments where Windows is very much in the minority. In my experience LDAP is often implemented badly; for example the majority of software still uses simple binds (i.e. plain logins) rather than using SASL binds which support a whole range of better authentication methods (e.g. GSSAPI, and even DIGEST-MD5 has been mandatory for v3 and is implemented on AD). And yes, while better authentication mechanisms do exist, I find that all too often most software packages claim LDAP support rather than Kerberos, and even then it is often limited to LDAP simple binds without ldaps support. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 16/07/17 23:26, Thomas Munro wrote: > Thank you very much for this feedback and example, which I used in the > documentation in the patch. I see similar examples in the > documentation for other things on the web. > > I'll leave it up to Magnus and Stephen to duke it out over whether we > want to encourage LDAP usage, extend documentation to warn about > cleartext passwords with certain LDAP implementations or > configurations, etc etc. I'll add this patch to the commitfest and > get some popcorn. If it helps, we normally recommend that clients use ldaps for both AD and UNIX environments, although this can be trickier from an administrative perspective in AD environments because it can require changes to the Windows firewall and certificate installation. Whilst OpenLDAP will support ldap+starttls you can end up with some clients with starttls either disabled or misconfigured sending plaintext passwords over the wire regardless, so it's generally easiest to firewall ldap port 389 at the edge of the trusted VLAN so that only ldaps port 636 connections make it out onto the untrusted network hosting the local AD/OpenLDAP server. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] More flexible LDAP auth search filters?
On 16/07/17 00:08, Thomas Munro wrote: > On Fri, Jul 14, 2017 at 11:04 PM, Magnus Haganderwrote: >> On Thu, Jul 13, 2017 at 9:31 AM, Thomas Munro >> wrote: >>> A post on planet.postgresql.org today reminded me that a colleague had >>> asked me to post this POC patch here for discussion. It allows custom >>> filters with ldapsearchprefix and ldapsearchsuffix. Another approach >>> might be to take a filter pattern with "%USERNAME%" or whatever in it. >>> There's an existing precedent for the prefix and suffix approach, but >>> on the other hand a pattern approach would allow filters where the >>> username is inserted more than once. >> >> >> Do we even need prefix/suffix? If we just make it "ldapsearchpattern", then >> you could have something like: >> >> ldapsearchattribute="uid" >> ldapsearchfilter="|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)" >> >> We could then always to substitution of the kind: >> (&(attr=)()) >> >> which would in this case give: >> (&(uid=mha)(|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team))) >> >> >> Basically we'd always AND together the username lookup with the additional >> filter. > > Ok, so we have 3 ideas put forward: > > 1. Wrap username with ldapsearchprefix ldapsearchsuffix to build > filter (as implemented by POC patch) > 2. Optionally AND ldapsearchfilter with the existing > ldapsearchattribute-based filter (Magnus's proposal) > 3. Pattern-based ldapsearchfilter so that %USER% is replaced with > username (my other suggestion) > > The main argument for approach 1 is that it follows the style of the > bind-only mode. > > With idea 2, I wonder if there are some more general kinds of things > that people might want to do that that wouldn't be possible because it > has to include (attribute=user)... perhaps something involving a > substring or other transformation functions (but I'm no LDAP expert, > that may not make sense). > > With idea 3 it would allow "(|(foo=%USER%)(bar=%USER%))", though I > don't know if any such multiple-mention filters would ever make sense > in a sane LDAP configuration. > > Any other views from LDAP-users? I've spent quite a bit of time integrating various bits of non-PostgreSQL software to LDAP and in my experience option 3 tends to be the standard. Generally you find that you will be given the option to set the attribute for the default search filter of the form "(attribute=username)" which defaults to uid for UNIX-type systems and sAMAccountName for AD. However there is always the ability to specify a custom filter where the user is substituted via e.g. %u to cover all the other use-cases. As an example, I don't know if anyone would actually do this with PostgreSQL but I've been asked on multiple occasions to configure software so that users should be allowed to log in with either their email address or username which is easily done with a custom LDAP filter like "(|(mail=%u)(uid=%u))". ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 15/12/14 19:08, Robert Haas wrote: On Sun, Dec 14, 2014 at 2:24 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: However if it were posted as part of patchset with a subject of [PATCH] gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique my interest enough to review the changes to the grammar rules, with the barrier for entry reduced to understanding just the PostgreSQL-specific parts. Meh. Such a patch couldn't possibly compile or work without modifying other parts of the tree. And I'm emphatically opposed to splitting a commit that would have taken the tree from one working state to another working state into a series of patches that would have taken it from a working state through various non-working states and eventually another working state. Furthermore, if you really want to review that specific part of the patch, just look for what changed in gram.y, perhaps by applying the patch and doing git diff src/backend/parser/gram.y. This is really not hard. Okay I agree that the suggested subject above was a little misleading, so let me try and clarify this further. If I were aiming to deliver this as an individual patch as part of a patchset, my target for this particular patch would be to alter both the bison grammar *and* the minimal underlying code/structure changes into a format that compiles but adds no new features. So why is this useful? Because the parser in PostgreSQL is important and people have sweated many hours to ensure its performance is the best it can be. By having a checkpoint with just the basic parser changes then it becomes really easy to bisect performance issues down to just this one particular area, rather than the feature itself. And as per my original message it is a much lower barrier of entry for a potential reviewer who has previous bison experience (and is curious about PostgreSQL) to review the basic rule changes than a complete new feature. I certainly agree that there are cases where patch authors could and should put more work into decomposing large patches into bite-sized chunks. But I don't think that's always possible, and I don't think that, for example, applying BRIN in N separate pieces would have been a step forward. It's one feature, not many. I completely agree with you here. Maybe this isn't exactly the same for PostgreSQL but in general for a new QEMU feature I could expect a patchset of around 12 patches to be posted. Of those 12 patches, probably patches 1-9 are internal API changes, code refactoring and preparation work, patch 10 is a larger patch containing the feature, and patches 11-12 are for tidy-up and removing unused code sections. Even with the best patch review process in the world, there will still be patches that introduce bugs, and the bugs are pretty much guaranteed to be caused by patches 1-9. Imagine a subtle bug in an internal API change which exhibits itself not in the feature added by the patchset itself but in another mostly unrelated part of the system; maybe this could be caused by a pass-by-value API being changed to a pass-by-reference API in patches 1-9 and this tickles a bug due to an unexpected lifecycle heap access elsewhere causing random data to be returned. Being able to bisect this issue down to a single specific patch suddenly becomes very useful indeed. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 15/12/14 19:24, Andrew Dunstan wrote: On 12/15/2014 02:08 PM, Robert Haas wrote: On Sun, Dec 14, 2014 at 2:24 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: However if it were posted as part of patchset with a subject of [PATCH] gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique my interest enough to review the changes to the grammar rules, with the barrier for entry reduced to understanding just the PostgreSQL-specific parts. Meh. Such a patch couldn't possibly compile or work without modifying other parts of the tree. And I'm emphatically opposed to splitting a commit that would have taken the tree from one working state to another working state into a series of patches that would have taken it from a working state through various non-working states and eventually another working state. Furthermore, if you really want to review that specific part of the patch, just look for what changed in gram.y, perhaps by applying the patch and doing git diff src/backend/parser/gram.y. This is really not hard. I certainly agree that there are cases where patch authors could and should put more work into decomposing large patches into bite-sized chunks. But I don't think that's always possible, and I don't think that, for example, applying BRIN in N separate pieces would have been a step forward. It's one feature, not many. +1 I have in the past found the bunch of patches approach to be more that somewhat troublesome, especially when one gets here is an update to patch nn of the series and one has to try to compose a coherent set of patches in order to test them. A case in point I recall was the original Foreign Data Wrapper patchset. In practice, people don't tend to post updates to individual patches in that way. What happens is that after a week or so of reviews, people go away and rework the patch and come back with a complete updated patchset clearly marked as [PATCHv2] with the same subject line and an updated cover letter describing the changes, so a complete coherent patchset is always available. Now as I mentioned previously, one of the disadvantages of splitting patches in this way is that mailing list volume tends to grow quite quickly - hence the use of [PATCH] filters and system identifiers in the subject line to help mitigate this. Whilst the volume of mail was a shock to begin with, having stuck with it for a while I personally find that the benefits to development outweigh the costs. Each individual project is different, but I believe that there are good ideas here that can be used to improve workflow, particularly when it comes to patch review. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 15/12/14 19:27, Robert Haas wrote: On Sun, Dec 14, 2014 at 4:53 PM, Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk wrote: What I find frustrating is that I've come back from a workflow where I've been reviewing/testing patches within months of joining a project because the barrier for entry has been so low, and yet even with much longer experience of the PostgreSQL codebase I feel I can't do the same for patches submitted to the commitfest. And why is this? It's because while I know very specific areas of the code well, many patches span different areas of the source tree of which I have little and/or no experience, which when supplied as a single monolithic patch makes it impossible for me to give meaningful review to all but a tiny proportion of them. So, there are certainly some large patches that do that, and they typically require a very senior reviewer. But there are lots of small patches too, touching little enough that you can learn enough to give them a decent review even if you've never looked at that before. I find myself in that situation as a reviewer and committer pretty regularly; being a committer doesn't magically make you an expert on the entire code base. You can (and I do) focus your effort on the things you know best, but you have to step outside your comfort zone sometimes, or you never learn anything new. Right. Which is why I'm advocating the approach of splitting patches in relevant chunks so that experts in those areas can review them in parallel. And even better, if I then want to start digging into other parts of the system as time and interest allow then I can choose to begin by picking a subject line from a patchset and going through this small individual patch in detail rather than a single large patch. It has often been suggested that people learn best when studying a mix of 80% of things they already know compared to 20% of things they don't. At least with more granular patches people can find a comfortable ratio of new/old material vs. a single large patch which could be 70-80% of completely new material and therefore much more difficult to pick-up. Another thought to ponder here is that by reviewing smaller patches which takes less time, for the same time cost a reviewer with experience in one specific area can in theory review and provide feedback on another 2-3 patchsets which should help relieve patch review starvation across patchset submissions. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 04:57, Noah Misch wrote: But that doesn't mean we should be turning anyone away. We should not. +1. Some of the best reviews I've seen are ones where the reviewer expressed doubts about the review's quality, so don't let such doubts keep you from participating. Every defect you catch saves a committer time; a review that finds 3 of the 10 defects in a patch is still a big help. Some patch submissions waste the community's time, but it's almost impossible to waste the community's time by posting a patch review. Confusion may have arisen due to statements that we need more expert reviewers, which is also true. (When an expert writes a sufficiently-complex patch, it's important that a second expert examine the patch at some point.) If you're a novice reviewer, your reviews do help to solve that problem by reducing the workload on expert reviewers. I should add that by reducing the barrier of entry for patch review, there is definitely potential to find common errors before a patch gets analysed in detail by a committer. For example I may not know a lot about certain PostgreSQL subsystems but from a decent C coding background I can pick out certain suspicious things in patches, e.g. potential overflows, pointer arithmetic bugs, spelling mistakes/incorrect comments etc. and flag them to the original submitter to double-check. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 07:33, David Rowley wrote: On 16 December 2014 at 18:18, Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com wrote: Man. You're equating stuff that's not the same. You didn't get your way (and I'm tentatively on your side onthat one) and take that to imply that we don't want more reviewers. During that thread a couple people said that novice reviewers added no value to the review process, and nobody argued with them then. I've also been told this to my face at pgCon, and when I've tried organizing patch review events. I got the message, which is why I stopped trying to get new reviewers. And frankly: if we're opposed to giving credit to patch reviewers, we're opposed to having them. I'd just like to add something which might be flying below the radar of more senior people. There are people out there (ike me) working on PostgreSQL more for the challenge and perhaps the love of the product, who make absolutely zero money out of it. For these people getting credit where it's due is very important. I'm pretty happy with this at the moment and I can't imagine any situation where not crediting reviewers would be beneficial to anyone. This is exactly where I am at the moment, having previously been more involved with the development side of PostgreSQL during the past. Personally having a credit as a patch reviewer isn't particularly important to me, since mail archives are good enough these days that if people do query my contributions towards projects then I can point them towards any reasonable search engine. The biggest constraint on my ability to contribute is *time*. Imagine the situation as a reviewer that I am currently on the mailing list for two well-known open source projects and I also have a day job and a home life to contend with. For the spare time that I have for review, one of these projects requires me to download attachment(s), apply them to a git tree (hopefully it still applies), run a complete make check regression series, try and analyse a patch which will often reference parts to which I have no understanding, and then write up a coherent email and submit it to the mailing list. Realistically to do all this and provide a review that is going to be of use to a committer is going to take a minimum of 1-2 hours, and even then there's a good chance that I've easily missed obvious bugs in the parts of the system I don't understand well. For the second project, I can skim through my inbox daily picking up specific areas I work on/are interested in, hit reply to add a couple of lines of inline comments to the patch and send feedback to the author/list in just a few minutes. The obvious question is, of course, which project gets the majority share of my spare review time? ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 10:49, Marko Tiikkaja wrote: On 12/16/14 11:26 AM, Mark Cave-Ayland wrote: On 15/12/14 19:27, Robert Haas wrote: So, there are certainly some large patches that do that, and they typically require a very senior reviewer. But there are lots of small patches too, touching little enough that you can learn enough to give them a decent review even if you've never looked at that before. I find myself in that situation as a reviewer and committer pretty regularly; being a committer doesn't magically make you an expert on the entire code base. You can (and I do) focus your effort on the things you know best, but you have to step outside your comfort zone sometimes, or you never learn anything new. Right. Which is why I'm advocating the approach of splitting patches in relevant chunks so that experts in those areas can review them in parallel. I don't see how splitting patches up would help with that. I often look at only the parts of patches that touch the things I've worked with before. And in doing that, I've found that having the context there is absolutely crucial almost every single time, since I commonly ask myself Why do we need to do this to implement feature X?, and only looking at the rest of the complete patch (or patch set, however you want to think about it) reveals that. I've already covered this earlier in the thread so I won't go into too much detail, but the overall flow of the patchset is described by the cover letter (please feel free to look at the example link I posted). In terms of individual patches within a patchset then if the combination of the cover letter and individual commit message don't give you enough context then the developer needs to fix this; either the patchset has been split at a nonsensical place or either one or other of the cover letter and/or commit message need to be revised. Of course, me looking at parts of patches, finding nothing questionable and not sending an email about my findings (or lack thereof) hardly counts as review, so somebody else still has to review the actual patch as a whole. Nor do I get any credit for doing any of that, which might be a show-stopper for someone else. But I think that's just because I'm not doing it correctly. I don't see why someone couldn't comment on a patch saying I've reviewed the grammar changes, and they look good to me. Sure, there is always scope for doing that which is what splitting patches and constant review encourages. In terms of the current commitfest system, the process for review is clearly documented and as I've pointed out in my response to David, extremely heavyweight in comparison. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 13:37, Claudio Freire wrote: For the second project, I can skim through my inbox daily picking up specific areas I work on/are interested in, hit reply to add a couple of lines of inline comments to the patch and send feedback to the author/list in just a few minutes. Notice though that on the second project, the review is made in the air. You didn't build, nor ran tests, you're guessing how the code performs rather than seeing it perform, so the review is light compared to the first. I think this doing developers in general a little injustice here. The general rule for a submitting patch to *any* project is: i) does it apply to the current source tree and ii) does it build and pass the regression tests? Maybe there is a greater incidence of this happening in PostgreSQL compared to other projects? But in any case the project has every right to refuse further review if these 2 simple criteria are not met. Also with a submission from git, you can near 100% guarantee that the author has actually built and run the code before submission. I have seen occasions where a patch has been submitted, and a committer will send a quick email along the lines of The patch looks good, but I've just applied a patch that will conflict with this, so please rebase and resubmit. You mention about performance, but again I've seen from this list that good developers can generally pick up on a lot of potential regressions by eye, e.g. lookup times go from O(N) to O(N^2) without having to resort to building and testing the code. And by breaking into small chunks people can focus their time on reviewing the areas they are good at. Occasionally sometimes people do get a patch ready for commit but it fails build/test on one particular platform. In that case, the committer simply posts the build/test failure to the list and requests that the patchset be fixed ready for re-test. This is a very rare occurrence though. Also you mentioned about light review but I wouldn't call it that. I see it more as with each iteration the magnifying glass used to look at the code gets bigger and bigger. A lot of initial comments for the second project are along the lines of this doesn't look right - won't this overflow on values 2G? or you're assuming here that A occurs before B, whereas if you run with -foo this likely won't work. And this is the area which I believe will have the greatest benefit for PostgreSQL - making it easier to catch and rework the obvious flaws in the patch in a timely manner before it gets to the committer. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 13:44, David Fetter wrote: On Tue, Dec 16, 2014 at 11:09:34AM +, Mark Cave-Ayland wrote: On 16/12/14 07:33, David Rowley wrote: On 16 December 2014 at 18:18, Josh Berkus j...@agliodbs.com mailto:j...@agliodbs.com wrote: Man. You're equating stuff that's not the same. You didn't get your way (and I'm tentatively on your side onthat one) and take that to imply that we don't want more reviewers. During that thread a couple people said that novice reviewers added no value to the review process, and nobody argued with them then. I've also been told this to my face at pgCon, and when I've tried organizing patch review events. I got the message, which is why I stopped trying to get new reviewers. And frankly: if we're opposed to giving credit to patch reviewers, we're opposed to having them. I'd just like to add something which might be flying below the radar of more senior people. There are people out there (ike me) working on PostgreSQL more for the challenge and perhaps the love of the product, who make absolutely zero money out of it. For these people getting credit where it's due is very important. I'm pretty happy with this at the moment and I can't imagine any situation where not crediting reviewers would be beneficial to anyone. This is exactly where I am at the moment, having previously been more involved with the development side of PostgreSQL during the past. Personally having a credit as a patch reviewer isn't particularly important to me, since mail archives are good enough these days that if people do query my contributions towards projects then I can point them towards any reasonable search engine. The biggest constraint on my ability to contribute is *time*. Imagine the situation as a reviewer that I am currently on the mailing list for two well-known open source projects and I also have a day job and a home life to contend with. For the spare time that I have for review, one of these projects requires me to download attachment(s), apply them to a git tree (hopefully it still applies), run a complete make check regression series, try and analyse a patch which will often reference parts to which I have no understanding, and then write up a coherent email and submit it to the mailing list. Realistically to do all this and provide a review that is going to be of use to a committer is going to take a minimum of 1-2 hours, and even then there's a good chance that I've easily missed obvious bugs in the parts of the system I don't understand well. For the second project, I can skim through my inbox daily picking up specific areas I work on/are interested in, hit reply to add a couple of lines of inline comments to the patch and send feedback to the author/list in just a few minutes. With utmost respect, you've missed something really important in the second that the first has, and frankly isn't terribly onerous: does an actual system produce working code? In the PostgreSQL case, you can stop as soon as you discover that the patch doesn't apply to master or that ./configure doesn't work, or that the code doesn't compile: elapsed time = 5 minutes. Or you can keep moving until you have made progress for the time you've allotted. As per my previous email, it's generally quite rare for a developer to post non-working code to a list (in some cases someone will send a quick reply pointing that this needs to be rebased because it appears to reference an old API). From what I see in PostgreSQL this mostly happens because of bitrot between the time the patch was posted to the list and the actual commitfest itself - so in one way the commitfest system exacerbates this particular problem further. But the bigger issue, as others have pointed out, has never been a technical one. It's motivating people whose time is already much in demand to spend some of it on reviewing. I wasn't discouraged by the preliminary patch review process or any feedback I got. My absence lately has more to do with other demands on my time. I am completely in agreement with you here. My approach is more along the lines of that I know access to long periods of time to review patches is often impractical, and so how can the review process be made more time-efficient in order to allow you, me and other people in similar time-limited environments to be able to participate more? ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 16/12/14 15:42, Claudio Freire wrote: Also with a submission from git, you can near 100% guarantee that the author has actually built and run the code before submission. I don't see how. Forks don't have travis ci unless you add it, or am I mistaken? Well as I mentioned in my last email, practically all developers will rebase and run make check on their patched tree before submitting to the list. Hopefully there aren't hordes of developers deliberating creating and submitting broken patches ;) You mention about performance, but again I've seen from this list that good developers can generally pick up on a lot of potential regressions by eye, e.g. lookup times go from O(N) to O(N^2) without having to resort to building and testing the code. And by breaking into small chunks people can focus their time on reviewing the areas they are good at. I meant performance as in running, not really performance. Sorry for the confusion. Okay no worries. I meant, you're looking at the code and guessing how it runs, but you don't really know. It's easy to make assumptions that are false while reviewing, quickly disproved by actually running tests. A light review without ever building can fail to detect those issues. A heavier review patching up and building manually is too much manual work that could really be automated. The optimum is somewhere between those two extremes. Correct. My analogy here was that people with varying abilities review the patches at their experience level, and once low-hanging/obvious design issues have been resolved then more experienced developers will start to review the patch at a deeper level. Occasionally sometimes people do get a patch ready for commit but it fails build/test on one particular platform. Again, pre-review CI can take care of this. Yes - see my next reply... In that case, the committer simply posts the build/test failure to the list and requests that the patchset be fixed ready for re-test. This is a very rare occurrence though. It shouldn't need to reach the repo, don't you think? When I say repo, I mean the local repo of the tree maintainer. Currently the tree maintainer pulls each merge request into his local tree, performs a buildfarm test and only pushes the merge upstream once this has passed. I guess the PostgreSQL equivalent of this would be having a merge-pending branch on the buildfarm rather than just a post-commit build of git master (which we see from reports to the list periodically fails in this way) and only pushing a set of patches when the buildfarm comes back green. Also you mentioned about light review but I wouldn't call it that. I've made my fair share of light reviews (not for pg though) and can recognize the description. It is a light review what you described. Surely not all reviews with inline comments are light reviews, but what you described was indeed a light review. A lot of initial comments for the second project are along the lines of this doesn't look right - won't this overflow on values 2G? or you're assuming here that A occurs before B, whereas if you run with -foo this likely won't work. And this is the area which I believe will have the greatest benefit for PostgreSQL - making it easier to catch and rework the obvious flaws in the patch in a timely manner before it gets to the committer. If you read carefully my reply, I'm not at all opposed to that. I'm just pointing out, that easier reviews need not result in better reviews. Better, easier reviews are those where the trivial reviews are automated, as is done in the project I linked. Yes I can definitely agree with that. See below again: Formatting, whether it still applies, and whether it builds and checks pass, all those are automatable. I should add that QEMU provides a branch of checkpatch.pl in the source tree which submitters are requested to run on their patchset before submission to the list. This catches patches that don't meet the project style guidelines including casing, braces, trailing whitespace, tab/space issues etc. and that's before the patch is even submitted to the list. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 15/12/14 16:28, Andres Freund wrote: I don't believe this really is a question of the type of project. I think it's more that especially the kernel has had to deal with similar problems at a much larger scale. And the granular approach somewhat works for them. Correct. My argument was that dividing patches into smaller, more reviewable chunks lessens the barrier for reviewers since many people can review the individual patches as appropriate to their area of expertise. The benefits of this are that the many parts of the patchset get reviewed early by a number of people, which reduces the workload on the core developers as they only need to focus on a small number of individual patches. Hence patches get reworked and merged much more quickly in this way. This is in contrast to the commitfest system where a single patch is i) often held until the next commitfest (where bitrot often sets in) and ii) requires the reviewer to have good knowledge all of the areas covered by the patch in order to give a meaningful review, which significantly reduces the pool of available reviewers. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 13/12/14 09:37, Craig Ringer wrote: Speaking as the originator of commitfests, they were *always* intended to be a temporary measure, a step on the way to something else like continuous integration. I'd really like to see the project revisit some of the underlying assumptions that're being made in this discussion: - Patches must be email attachments to a mailing list - Changes must be committed by applying a diff ... and take a look at some of the options a git-based workflow might offer, especially in combination with some of the tools out there that help track working branches, run CI, etc. Having grown used to push/pull workflows with CI integration I find the PostgreSQL patch workflow very frustrating, especially for larger patches. It's particularly annoying to see a patch series squashed into a monster patch whenever it changes hands or gets rebased, because it's being handed around as a great honking diff not a git working branch. Is it time to stop using git like CVS? While not so active these days with PostgreSQL, I believe there is still great scope for improving the workflow of reviewing and applying patches. And while the commitfests were a good idea when they started, it's obvious that in their current form they do not scale to meet the current rate of development. If I could name just one thing that I think would improve things it would be submission of patches to the list in git format-patch format. Why? Because it enables two things: 1) by definition patches are small-chunked into individually reviewable pieces and 2) subject lines allow list members to filter things that aren't relevant to them. Here's an example of how the workflow works for the QEMU project which I'm involved in, which also has similar issues in terms of numbers of developers and supported platforms: 1) Developer submits a patch to the -devel list with git-format-patch - For subsystems with listed maintainers in the MAINTAINERS file, the maintainer must be CCd on the patchset (this is so that even if developers decide to filter PATCH emails to the list, they still get the CC copy) - Patchsets must be iterative and fully bisectable, with clear comments supplied in the commit message - Any patchset with 1 patch MUST have a cover letter - All patches have a Signed-off-by indicating that the developer has accepted the terms of the project license - Each subject line should be in the format [PATCH] subsystem: one line comment to enabled people to determine its relevance 2) Other developers start reviewing patches There are several mini-workflows that tend to occur here so I'll try and give some popular examples. - If a patch is a build fix, one of the core committers will verify and apply directly to the master repository - If a maintainer is happy with the whole patchset, they reply to the cover letter with a Reviewed-by and a line stating which subtree the patchset has been applied to. Maintainers add a Signed-off-by to all patches accepted via their subtree. - A maintainer may indicate that the patch itself is fine, but the commit message is not clear/incorrect and should be changed before being resubmitted - Any member of the mailing list may reply/review an individual patch by hitting reply in their mail client. Patch comments are included inline. If a reviewer is happy with an individual patch, they reply to the patch and add a Reviewed-by line; anyone can provide a review, even if they are not a maintainer - For complex patches, a maintainer may request that the patchset may be split into further patchsets. In general patchsets of 20-30 patches tend to be frowned upon, and will often immediately result in a reply saying please break this down into smaller chunks - A maintainer may reply and say that while the patchset in its final form needs more work, some clean-ups introduced by the patch are ready to be applied, e.g. please resubmit patches 1-4, 7 and 8 as a separate patchset which can then be applied directly to a subtree 3) Patchset tidy-up and submission After the initial review, new versions of the patchset are generally reposted to the list within a few days. - The patch version is included in the header with the same subject line, e.g. [PATCHv2] Add new feature - The cover letter contains a mini-changelog giving the changes between v1 and v2 e.g. v2: Fix spelling mistake pointed out by Peter Change lock ordering requested by Andreas - Reviewed-by tags for individual patches sent to the list are appended to the commit message for that patch before resubmission Once a maintainer has accepted a patch into their subtree, they send a pull request to the core maintainers to apply their trees to the main repository. I appreciate that currently merge commits aren't used in the PostgreSQL git repository so a pull request effectively becomes rebase this patchset onto master and push. So why does this help patch review? From my experience I can
Re: [HACKERS] Commitfest problems
On 14/12/14 15:51, Craig Ringer wrote: On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote: Compare this to say, for example, huge patches such as RLS. I specifically objected to that being flattened into a single monster patch when I saw that'd been done. If you look at my part in the work on the row security patch, while I was ultimately unsuccessful in getting the patch mergeable I spent quite a bit of time splitting it up into a logical patch-series for sane review and development. I am quite annoyed that it was simply flattened back into an unreviewable, hard-to-follow blob and committed in that form. It's not like development on a patch series is difficult. You commit small fixes and changes, then you 'git rebase -i' and reorder them to apply to the appropriate changesets. Or you can do a 'rebase -i' and in 'e'dit mode make amendments to individual commits. Or you can commit 'fixup!'s that get auto-squashed. This is part of my grumbling about the use of git like it's still CVS. I just want to add that I'm always grateful for the time that yourself and all committers put into PostgreSQL, and my example of RLS was really to signify big feature X rather than to pick on a particular aspect of that patch. I apologise if you though I was in any way criticising the work that you, or anyone else, put into this feature. The argument I wanted to make is that if someone starts seeing a problem with a current build of PostgreSQL and git bisects it down to a particular commit, then smaller, iterative commits are extremely helpful in this regard. When searching for a needle in a haystack, there is very big difference between a add big feature X haystack and a change struct alignment for Y haystack, the latter which is implicit in having smaller, iterative patchsets. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 14/12/14 15:57, Craig Ringer wrote: On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote: If I could name just one thing that I think would improve things it would be submission of patches to the list in git format-patch format. Why? Because it enables two things: 1) by definition patches are small-chunked into individually reviewable pieces and 2) subject lines allow list members to filter things that aren't relevant to them. Personally I do just that. Even though the official docs say context diff is preferred, I think most people now use context diff in practice, from 'git diff'. I'm surprised most people don't use git format-patch . Note that we can't just git am a patch from git format-patch at the moment, because policy is that the committer should be the Author field. The project would have to define a transition point where we started using the git Author field for the author and the separate Committer field, like git-am does by default. I'm not sure that'll ever actually happen, or that it's even desirable. Sure, I appreciate that. The part I particularly wanted to emphasise here was the use of a system identifier as part of the subject line, e.g. say if there was a hypothetical patchset which sped up PostgreSQL by 20% then you could see subjects like this: [PATCH 1/x] lmgr: reduce lock strength for LWLock [PATCH 2/x] wal: change lock ordering for WAL writes [PATCH 3/x] optimiser: exit early if lock unobtainable While these are obviously unrealistic, what I hope here is that people with interest in these areas would take note of individual patches even if they aren't interested in the entire patchset. So maybe since Andreas did some recent locking work, he spots lmgr in the subject and then makes a note to review that part of the patch. Similary Heikki could spot wal and make a note to look at that, whilst Tom would zero in on the optimiser changes. The aim here is that no one person needs to sit and initially review a complete patch before returning feedback to the developer. - Patchsets must be iterative and fully bisectable, with clear comments supplied in the commit message Fully support this one. Also in the smallest reasonable size divisions. I should add here that the QEMU folk do tend to go to great lengths to preserve bisectability; often intermediate compatibility APIs are introduced early in the patchset and then removed at the very end when the final feature is implemented. - If a maintainer is happy with the whole patchset, they reply to the cover letter with a Reviewed-by and a line stating which subtree the patchset has been applied to. Maintainers add a Signed-off-by to all patches accepted via their subtree. Sounds a lot like the kernel workflow (though in practice it seems like the kernel folks tend bypass all this mailing list guff and just use direct pulls/pushes for lots of things). Yes indeed - mainly from the people on the KVM module side. Again I'm not saying that this workflow is correct for PostgreSQL, I was trying to use this an example to explain *why* the workflow is done in this manner and how it helps developers such as myself. - Any member of the mailing list may reply/review an individual patch by hitting reply in their mail client. Patch comments are included inline. If a reviewer is happy with an individual patch, they reply to the patch and add a Reviewed-by line; anyone can provide a review, even if they are not a maintainer This is fun with many mail clients that like to linewrap text bodies and don't permit inline replies to attached text. Wow really? I can't say I've seen this in practice for a long time but I defer to your experience here. 6) Smaller patches are applied more often By breaking large patches down into smaller chunks, even if the main feature itself isn't ready to be applied to the main repository then a lot of the preceding patches laying down the groundwork can. I think PostgreSQL does OK on smaller patches - at least sometimes. There can be a great deal of bikeshedding and endless arguments, back-and-forth about utility, in-tree users, etc, but no formal process is ever going to change that. The process of getting an uncontroversial patch into Pg is generally painless, if often rather slow unless a committer sees it and commits it immediately upon it being posted. I agree that trivial patches do tend to get picked up reasonably quickly. It strikes me that it may prevent patches slipping through the list if someone were officially nominated as a trivial patch monitor, i.e. someone for developers to ping if their patch has been ignored but it sounds like this is not such an issue in practice. The benefits here are that people with large out-of-tree patches I'm not sure we have all that many people in that category - though I'm a part of a team that most certainly does fit the bill. Even the small patches for BDR have been challenging and often contentious
Re: [HACKERS] Commitfest problems
On 14/12/14 17:05, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote: Compare this to say, for example, huge patches such as RLS. I specifically objected to that being flattened into a single monster patch when I saw that'd been done. If you look at my part in the work on the row security patch, while I was ultimately unsuccessful in getting the patch mergeable I spent quite a bit of time splitting it up into a logical patch-series for sane review and development. I am quite annoyed that it was simply flattened back into an unreviewable, hard-to-follow blob and committed in that form. TBH, I'm not really on board with this line of argument. I don't find broken-down patches to be particularly useful for review purposes. An example I was just fooling with this week is the GROUPING SETS patch, which was broken into three sections for no good reason at all. (The fourth and fifth subpatches, being alternative solutions to one problem, are in a different category of course.) Too often, decisions made in one subpatch don't make any sense until you see the larger picture. The way in which this is normally handled is via the cover letter which is going to be nicely captured in the mail archives; typically a cover letter for a patchset consists of several paragraphs along the lines of patches 1-3 do some re-org work, patch 4 reworks the Y API for new feature X implemented in patches 9-12. As an example take a look at the cover letter for this patch submitted over the past few days: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01990.html. It seems to me that this would give the level of detail which you are looking for. I agree that definitely in some cases patches could be broken into too fine pieces, but then I think it's perfectly okay for committers to turn around and request the series be re-submitted in more sensible chunks if they feel things have gone too far the other way. Also, speaking of the larger picture: the current Postgres revision history amounts to 37578 commits (as of sometime yesterday) --- and that's just in the HEAD branch. If we'd made an effort to break feature patches into bite-size chunks like you're recommending here, we'd probably have easily half a million commits in the mainline history. That would not be convenient to work with, and I really doubt that it would be more useful for git bisect purposes, and I'll bet a large amount of money that most of them would not have had commit messages composed with any care at all. Having worked on a few kernel patches myself, git is surprisingly effective on huge repositories once the cache is warmed up so I don't feel this would be an issue with a PostgreSQL git repository which will be many magnitudes smaller. In terms of commit messages, I don't know if you missed that part of my original response to Craig but it's considered normal for a maintainer to reject a patch because of an incorrect/irrelevant commit message. So if I submitted a patch that fixed a particular bug but you as a committer weren't happy with the commit message, then I'm perfectly okay with you asking me to resubmit with updated/corrected patch message wording. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 14/12/14 17:30, Andrew Dunstan wrote: On 12/14/2014 12:05 PM, Tom Lane wrote: Craig Ringer cr...@2ndquadrant.com writes: On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote: Compare this to say, for example, huge patches such as RLS. I specifically objected to that being flattened into a single monster patch when I saw that'd been done. If you look at my part in the work on the row security patch, while I was ultimately unsuccessful in getting the patch mergeable I spent quite a bit of time splitting it up into a logical patch-series for sane review and development. I am quite annoyed that it was simply flattened back into an unreviewable, hard-to-follow blob and committed in that form. TBH, I'm not really on board with this line of argument. I don't find broken-down patches to be particularly useful for review purposes. An example I was just fooling with this week is the GROUPING SETS patch, which was broken into three sections for no good reason at all. (The fourth and fifth subpatches, being alternative solutions to one problem, are in a different category of course.) Too often, decisions made in one subpatch don't make any sense until you see the larger picture. Also, speaking of the larger picture: the current Postgres revision history amounts to 37578 commits (as of sometime yesterday) --- and that's just in the HEAD branch. If we'd made an effort to break feature patches into bite-size chunks like you're recommending here, we'd probably have easily half a million commits in the mainline history. That would not be convenient to work with, and I really doubt that it would be more useful for git bisect purposes, and I'll bet a large amount of money that most of them would not have had commit messages composed with any care at all. I have tried to stay away from this thread, but ... I'm also quite dubious about this suggested workflow, partly for the reasons Tom gives, and partly because it would constrain the way I work. I tend to commit with little notes to myself in the commit logs, notes that are never intended to become part of the public project history. I should be quite sad to lose that. Interestingly enough, I tend to work in a very similar way to this. When submitting patches upstream, I tend to rebase on a new branch and then squash/rework as required. One thing I do tend to find is that once I start rebasing the patches for upstream, I find that many of my personal notes can be rewritten as part of the patch comment (I would like to think that comments useful to myself will be useful to other developers some day). Once the rebase is complete, often I find that I have no non-public comments left so that problem becomes almost non-existent. As for using git bisect, usually when I do this each iteration is quite expensive. Multiplying the number of commits by a factor between 10 and 100, which is what I think this would involve, would just make git bisect have to do between 3 and 7 more iterations, ISTM. That's not a win. I find that this isn't too bad in practice. If you think about monolithic patches during a commitfest, you can imagine that most of them will touch one of the core .h files which will require most things to be rebuilt once applied during bisection. With smaller iterative patchsets, each patch tends to only touch a handful of files and so make install generally only needs to rebuild and link a very small number of files which is reasonably quick. Since all of the changes for global header files are contained in 1-2 patches then the number of complete rebuilds isn't as many as you might expect. On the larger issue, let me just note that I don't believe we have what is fundamentally a technological problem, and while technological changes can of course sometimes make things easier, they can also blind us to the more basic problems we are facing. Absolutely. I firmly believe that the right tools for the job are available, it's really trying to find a workflow solution that works well for everyone involved in the project. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 14/12/14 18:24, Peter Geoghegan wrote: On Sun, Dec 14, 2014 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote: TBH, I'm not really on board with this line of argument. I don't find broken-down patches to be particularly useful for review purposes. An example I was just fooling with this week is the GROUPING SETS patch, which was broken into three sections for no good reason at all. (The fourth and fifth subpatches, being alternative solutions to one problem, are in a different category of course.) Too often, decisions made in one subpatch don't make any sense until you see the larger picture. It sounds like they didn't use the technique effectively, then. I think it will be useful to reviewers that I've broken out the mechanism through which the ON CONFLICT UPDATE patch accepts the EXCLUDED.* pseudo-alias into a separate commit (plus docs in a separate commit, as well as tests) - it's a non-trivial additional piece of code that it wouldn't be outrageous to omit in a release, and isn't particularly anticipated by earlier cumulative commits. Maybe people don't have a good enough sense of what sort of subdivision is appropriate yet. I think that better style will emerge over time. For me this is a great example of how to get more developers involved in patch review. Imagine that I'm an experienced developer with little previous exposure to PostgreSQL, but with a really strong flex/bison background. If someone posts a patch to the list as a single grouping sets patch, then I'm going to look at that and think I have no way of understanding this and do nothing with it. However if it were posted as part of patchset with a subject of [PATCH] gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique my interest enough to review the changes to the grammar rules, with the barrier for entry reduced to understanding just the PostgreSQL-specific parts. What should happen over time is that developers like this would earn the trust of the committers, so committers can spend more time reviewing the remaining parts of the patchset. And of course the project has now engaged a new developer into the community who otherwise may not have not participated. Of course, if you still prefer a monolithic commit, it's pretty easy to produce one from a patch series. It's not easy to go in the other direction, though. Agreed. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Commitfest problems
On 14/12/14 20:07, Craig Ringer wrote: On 12/15/2014 02:46 AM, Mark Cave-Ayland wrote: Interestingly enough, I tend to work in a very similar way to this. When submitting patches upstream, I tend to rebase on a new branch and then squash/rework as required. Same here, and I find it works really well ... when I do it properly. I usually have a private development branch that's full of fixup! commits, WIPs, awful commit messages, etc. Then I have the tree that's been rebased, squashed, and tided up. I periodically tidy my latest work and replace the old clean tree with it, then start my new development branch on top of the new clean tree. This gives me a somewhat nice looking, well ordered patch series to work on top of, while retaining the flexibility to do lots of quick fixes etc. Admittedly, sometimes the development tree gets a bit large and it takes a while before I give it a proper cleanup. My current project being very much in that category. Still - it works well in general. That describes my workflow fairly well too. I find that this isn't too bad in practice. If you think about monolithic patches during a commitfest, you can imagine that most of them will touch one of the core .h files which will require most things to be rebuilt once applied during bisection. It's not build time, it's test-run time. Especially if you can't automate the test, or it isn't practical to. That's a legitimate concern - but I don't think we'd see a blowout of patch counts to quite the same degree. At the end of the day, each project finds its own level as to how complex individual patches should be and what should comprise a patchset. Over the past couple of years the QEMU process has evolved into its current form as maintainers and developers have figured out what works best for them, and I don't see that PostgreSQL would be any different in this respect - it takes time to adapt to a new workflow and get it right. Having worked on various parts for PostGIS for around 10 years, I've had my head stuck into various parts of the GiST code and have got to know a few parts of it really well. What I find frustrating is that I've come back from a workflow where I've been reviewing/testing patches within months of joining a project because the barrier for entry has been so low, and yet even with much longer experience of the PostgreSQL codebase I feel I can't do the same for patches submitted to the commitfest. And why is this? It's because while I know very specific areas of the code well, many patches span different areas of the source tree of which I have little and/or no experience, which when supplied as a single monolithic patch makes it impossible for me to give meaningful review to all but a tiny proportion of them. I believe that this is one of the main reasons why people struggle to find patch reviewers, with the consequence being that the majority of work falls back onto the CF manager and the committers which is why we have the current situation. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 02/07/14 08:36, Andres Freund wrote: On 2014-07-01 20:21:37 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-07-01 23:21:07 +0100, Mark Cave-Ayland wrote: Also if you're struggling for Sun buildfarm animals, recent versions of QEMU will quite happily install and run later versions of 32-bit Solaris over serial, and 2.0 even manages to give you a cgthree framebuffer for the full experience. Well. I have to admit I'm really not interested in investing that much time in something I've no stake in. If postgres developers have to put emulated machines to develop features something imo went seriously wrong. That's more effort than at least I'm willing to spend. Perhaps more to the point, I have no faith at all that an emulator will mimic multiprocessor timing behavior to the level of detail needed to tell whether memory-barrier-related logic works. See the VAX discussion just a couple days ago. Well, it would allow us to see wether fixed stuff actually compiles and runs - that's not nothing. The biggest problem with fixing stuff like armv5, sparc8, whatever is that it requires adding stuff to our inline assembly. It's easy to accidentally make it not compile, but comparatively harder to make the behaviour even worse than before. The point I wanted to make was that there are certain applications for which SPARCv8 is still certified for particular military/aerospace use. While I don't use it myself, some people are still using it enough to want to contribute QEMU patches. In terms of QEMU, the main reason for mentioning it was that if the consensus were to keep SPARCv8 support then this could be one possible way to provide basic buildfarm support (although as Tom rightly points out, there would need to be testing to build up confidence that the emulator does the right thing in a multiprocessor environment). ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Spinlocks and compiler/memory barriers
On 01/07/14 11:04, Andres Freund wrote: Since we have a Sun Studio machine in the buildfarm, we shouldn't give up on SPARC completely, but maybe we should only add the cases for sparcv8+ and above? That at least has some chance of getting tested. That we have code for sparcv7 is ridiculous btw. Sparcv8 (not 8+/9) is from 1992. v7/sun4c been dropped from solaris 8 if I see that correctly. I agree that v8 (in contrast to v8+ aka v9 in 32bit mode) support has to go as well. I'd personally vote for backpatching a note to installation.sgml saying that it's probably not working, and not do anything else there. That means we also should replace the ldstub by cas in the the gcc inline assembly - but we have buildfarm members for that, so it's not too bad. Being involved in QEMU SPARC development, I can tell you that patches are still actively being received for SPARCv8. The last set of CPU patches were related to fixing bugs in the LEON3, a 32-bit SPARC CPU which is available in hardened versions certified for extreme environments such as military and space. I'd hate to find out that they switched to another database because they couldn't upgrade PostgreSQL on the ISS ;) Also if you're struggling for Sun buildfarm animals, recent versions of QEMU will quite happily install and run later versions of 32-bit Solaris over serial, and 2.0 even manages to give you a cgthree framebuffer for the full experience. ATB, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Interrupting long external library calls
Hi all, One of the issues we've been looking at with PostGIS is how to interrupt long-running processing tasks in external libraries, particularly GEOS. After a few tests here, it seems that even the existing SIGALRM handler doesn't get called if statement_timeout is reached when in an external library, e.g. with PostgreSQL 9.0/PostGIS 2.0 trunk/GEOS: pg90@kentang:~$ ./rel/bin/postgres --single -D data postgis PostgreSQL stand-alone backend 9.0.7 backend create database foo; backend drop database foo; backend show statement_timeout; 1: statement_timeout (typeid = 25, len = -1, typmod = -1, byval = f) 1: statement_timeout = 500ms (typeid = 25, len = -1, typmod = -1, byval = f) backend set statement_timeout = '10ms'; backend create database foo; ERROR: canceling statement due to statement timeout STATEMENT: create database foo; (note the following command takes about 3-4s to run but doesn't get cancelled) backend select st_segmentize('LINESTRING(0 0, 10 0)', 1e-12); ERROR: invalid memory alloc request size 1073741824 Is there a way in which we can interrupt long-running external library calls that are unable to call the PostgreSQL CHECK_FOR_INTERRUPTS() macro directly? If it is possible for standard alarms to work in this way then something along the lines of an alarm within PostgreSQL itself that manually invokes CHECK_FOR_INTERRUPTS() every 1s while the query is running could potentially solve the issue for us. Many thanks, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interrupting long external library calls
On 16/05/12 11:39, Heikki Linnakangas wrote: One of the issues we've been looking at with PostGIS is how to interrupt long-running processing tasks in external libraries, particularly GEOS. After a few tests here, it seems that even the existing SIGALRM handler doesn't get called if statement_timeout is reached when in an external library, e.g. with PostgreSQL 9.0/PostGIS 2.0 trunk/GEOS: If you interrupt an external library call, it might leave memory in an inconsistent state, or some other such thing. It's not generally safe to interrupt arbitrary 3rd party code. However, if you're absolutely positively sure that the library function can tolerate that, you can set ImmediateInterruptOK = true before calling it. See e.g PGSemaphoreLock() on how that's done before starting to sleep on a semapgore. Hi Heikki, Yes that appears to work fine on the surface - just testing now to determine what state everything is left in when queries are aborted prematurely. Many thanks, Mark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64
On 14/12/11 13:59, Andrew Dunstan wrote: Hmm. Yeah, if I remove -O0 and instead set CFLAGS to -ffloat-store the error goes away. So, would we want to use that just for this file, or for the whole build? Well the latest documentation for gcc gives 2 options: -ffloat-store and -fexcess-precision=style which are documented at http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html#Optimize-Options. For PostGIS we only applied the flags for specific files, because of severe performance warnings in older versions of the gcc documentation such as this: http://www.delorie.com/gnu/docs/gcc/gcc_10.html. I have no idea whether these warnings still hold true or not for more modern compiler versions. ISTM that the best solution would be to determine whether or not -fexcess-precision=standard does the right thing (and also determine the performance hit) or take a look at the excess precision notes in the older documentation to see if parts of the code can be rearranged to eliminate the effect. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64
On -10/01/37 20:59, Andrew Dunstan wrote: This is apparently an optimization bug in the compiler. If I turn optimization off (CFLAGS=-O0) it goes away. Ick. So at the moment I'm a bit blocked. I can't really file a bug because the compiler can't currently be used to build postgres, I don't have time to construct a self-contained test case, and I don't want to commit changes to enable the compiler until the issue is solved. If we're talking about adding support for a previously-unsupported configuration, it seems to me that it would be fine to commit a patch that made everything work, but for the compiler bug. We could refrain from stating that we officially support that configuration until the compiler bug is fixed, or even document the existence of the bug. We can't be responsible for other people's broken code, but I don't necessarily see that as a reason not to commit a prerequisite patch. Otherwise, as you say, there's a chicken-and-egg problem, and who does that help? Yeah, fair enough. I'll work on that. Definitely do this (and then file a bug report with the project). I've worked with both Kai and NightStrike from the MingW-W64 project to fix previous bugs, and as long as they can build the offending source themselves then they are very helpful and quick to respond. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improved parallel make support
Dave Page wrote: Thanks - installed. As a matter of policy, I do not want to drop support for a FOSS build tool chain on Windows if at all avoidable. Nor I, however I only have limited time to dedicate to that goal. One thing to think about is that since PostGIS uses MingW/PGXS on Windows, we use MingW builds in order to generate the Makefiles we need (there is no native MSVC build for Windows). Not being able to do this would cause us great inconvenience :( ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improved parallel make support
Dave Page wrote: On Thu, Nov 11, 2010 at 4:51 PM, Andrew Dunstan and...@dunslane.net wrote: Interesting. Doesn't EDB's PostgresPlus package include PostGIS, and isn't its Windows version build with MSVC? Yes - it's a PITA as we have to have a dummy build of the server in mingw/msys to compile PostGIS and Slony. We're probably going to be looking at that in the not-to-distant future as we want 64bit builds of both and will be using VC++. Just for the record, a lot of work was done in the 1.4 release series to make MSVC builds possible, and indeed several people have reported success: http://postgis.refractions.net/pipermail/postgis-devel/2009-March/005102.html http://postgis.refractions.net/pipermail/postgis-devel/2010-September/010299.html The two main outstanding issues as I see it are: 1) The GTK-based GUI for shp2pgsql (although if someone wanted to sponsor work to convert to wxWidgets to bring us in line with pgAdmin, that would be strongly considered). 2) Maintenance of the MSVC build system. So far we have had some complaints about not using MSVC, but then no-one has stepped up to maintain the build system for it. Forcing all existing developers to suddenly start maintaining the Windows build is a total non-starter. My hope is that one day CMake will enable us to come up with a universal solution, but we're some way from that yet. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
Paul Ramsey wrote: So what kind of data structure would you like for a typmod? I'm a primitive enough beast that just having 64-bits would make me happy. As a general matter though, a bytea? P For my vote, I'd prefer either the Oid of a custom type or an array of Oid, Datum pairs - i.e. something we can extend in the future if required. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] knngist - 0.8
David Fetter wrote: For my vote, I'd prefer either the Oid of a custom type or an array of Oid, Datum pairs - i.e. something we can extend in the future if required. This sounds a lot like a foreign key to another table. Are you not proposing doing that because of performance considerations? Cheers, David. Well, in PostGIS a typmod contains 3 pieces of information: 1) the SRID 2) the dimension 3) the geometry type The SRID is technically already a foreign key into another table, with dimension and SRID as other information. At the moment, we bit-pack the dimension and geometry type into the SRID and use that as the typmod but this only leaves 21 bits IIRC for the SRID. The additional complication is that SRIDs at the higher end of the range are allowed for anyone to use, and so people may have their own customised spheroids defined in this region of the table. If we had a foreign key into another table, we'd need to ensure that no one could tamper with it as otherwise all chaos would break lose, e.g. breaking the geometry type constraint on a column. Heck, we even have people deleting the geometry_columns table sometimes because they are not aware of what it does. By storing this information in the PG catalog then this can't happen, plus the information is available easily in Form_pg_attribute without having to implement our own cache, with its own related problems such as how/when to invalidate etc. There is also a chance that we'd want to include additional information in the future related to geometry validity, for example, which would mean further reducing the range allowed within the spatial_ref_sys table in its existing form. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3
Andrew Dunstan wrote: The real problem has nothing to do with any of the analysis, as you say. It is this: they have an override file for PGXS and it uses $(mkinstalldirs) which we got rid of about a year ago. So apparently they haven't been testing much against any of our alphas or betas or they would have seen this long ago. The correct fix is to do the following in the PostGIS source root: sed -i -e 's/mkinstalldirs/MKDIR_P/' postgis/Makefile.pgxs cheers andrew Hmmm that's totally wrong - the override in Makefile.pgxs should only ever be loaded for PostgreSQL 8.3 and 8.4, and not PostgreSQL 9.0 since it already has the correct installation paths. What I suspect is that you're actually getting bitten by this: http://markmail.org/message/k7iolbazhrqhijfk#query:pg_config%20jun%202007+page:1+mid:rqk6ux2e7npqbrzf+state:results Or, in other words, configure is picking up the wrong pg_config. Since the path fix in the thread was not backported to 8.3, then the presence of an another pg_config for PostgreSQL 8.3 in PATH will break things :( ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3
Andrew Dunstan wrote: No, the configure test is wrong. Here's what's in configure.ac: dnl Temporary hack until minimum PostgreSQL version is 8.5: dnl If PostgreSQL 8.5 is detected, trigger the inclusion of the new versioned PGXS targets PGXSOVERRIDE=0 if test ! $PGSQL_MINOR_VERSION -ge 5; then PGXSOVERRIDE=1 fi Of course, we don't have any such thing as 8.5. Ah wait - I see. The fix is in already in the 1.5 branch, it just hasn't hit a release yet :( http://trac.osgeo.org/postgis/changeset/5421/branches/1.5/configure.ac Looks like we need to push a new release in time for 9.0... ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why are these modules built without respecting my LDFLAGS?
Tom Lane wrote: Should we try to make that a bit more consistent, and if so how? The shenanigans in Makefile.shlib would get a lot simpler if we said that shlib links always include LDFLAGS *plus* LDFLAGS_SL, but I would think that that would carry substantial risk of breakage. Surely there are cases where linker switches are appropriate for making executables but not shlibs. Perhaps we should set up three variables instead of two, viz LDFLAGS = switches for linking both executables and shlibs LDFLAGS_EX = extra switches for linking executables only LDFLAGS_SL = extra switches for linking shlibs only Then we could get rid of that untrustworthy hack for extracting -L switches ... While we're on the subject... this reminds me of another issue that's come up a few times on the PostGIS mailing lists. AFAICT pg_config doesn't have a method for generating LDFLAGS for libpq client applications, only backend libraries with pg_config --libs. Normally we just go for -lpq but that doesn't always seem to work on platforms where you need to explicitly give all libpq dependencies during link time, e.g. http://postgis.refractions.net/pipermail/postgis-users/2010-April/026349.html. Would LDFLAGS_EX in this case be what we need? If so, could it be exposed via a pg_config --libpq option or similar? ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CFReview] Red-Black Tree
Teodor Sigaev wrote: I would like to see point #2 of the following email addressed before commit. As things stand, it is not clear (at least to me) whether this is a win. Reimplementation of ginInsertRecordBA reduces difference of HEAD and HEAD+rbtree in regular case. Test suite is taken from http://www.sai.msu.su/~megera/wiki/2009-04-03: SEQ: SELECT array_to_string(ARRAY(select '' || a || '.' || b from generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM generate_series(1,10) a; RND: SELECT array_to_string(ARRAY(select '' || random() from generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM generate_series(1,10) a; Times in seconds: HEAD 0.9 0.11 SEQ 130 113111 RND11.4 12.6 11.5 The ides was to change order of insertion - now insertion order decreases number of rebalancing. Oleg's test (http://www.sai.msu.su/~megera/wiki/rbtree_test) are made with v0.10 which is differ from 0.11 only by comments around ginInsertRecordBA() Here is a quick comparison between the current 0.11 patch against my original 0.7 patch when building Oleg's simple data. (Note: due to time constraints, this is just a single run to get a feel for performance) 0.7 patch = rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1910741.352 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1647609.300 ms 0.11 patch == rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1864013.526 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1661200.454 ms HTH, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [CFReview] Red-Black Tree
Robert Haas wrote: Maybe we are now getting to the heart of the confusion. Mark wrote in his email: Unfortunately I was not really able to reproduce the RND (teodor's) dataset, nor the random array test as the SQL used to test the implementation was not present on the page above. The SQL for the fixed-length tests is posted, but the SQL for the variable length test is not - so Mark was just guessing on that one. Or am I just totally confused? ...Robert No, that's correct. In the Repeat test with 100,000 identical records varying array length (len) section, it's fairly easy to substitute in the varying values of len where len = 3, 30 and 50. As documented in my review email I had a guess at generating the contents of RND (teodor's) column with this query: select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,10) b; However, unlike the other figures this is quite a bit different from Oleg/Teodor's results which make me think this is the wrong query (3.5s v 9s). Obviously Robert's concern here is that it is this column that shows one of the largest performance decreases compared to head. I've also finished benchmarking the index creation scripts yesterday on Oleg's test dataset from http://www.sai.msu.su/~megera/postgres/files/links2.sql.gz. With maintenance_work_mem set to 256Mb, the times I got with the rbtree patch applied were: rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1910741.352 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1647609.300 ms Without the patch applied, I ended up having to shutdown my laptop after around 90 mins before the first index had even been created. So there is a definite order of magnitude speed increase with this patch applied. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CommitFest Status Summary - 2010-02-03
Robert Haas wrote: Here's an overview of where we stand with the remaining 14 patches, according to my best understanding of the situation. * rbtree - I have done a lot of work reviewing this, and Mark Cave-Ayland has done some work on it, too. But there are some unanswered performance questions that need to be addressed before commit. This is another one that could really use some more eyes on it. * knngist - The third remaining big patch. Mark Cave-Ayland volunteered to review this one, too, but so far no review has been posted on -hackers. I know that the PostGIS folks would really like to have this, but time is growing short. Yes. I'm currently working on the knngist patch now, although sadly work got side-tracked onto the rbtree patch since it is a requirement for the knngist patch. Now that the rbtree patch is in reasonably good shape, I intend to focus the rest of my time working on the knngist patch exclusively. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] development setup and libdir
Ivan Sergio Borgonovo wrote: Of course I can write a script that can workaround this. It seems that the only thing missing is that pgxs 8.3 used to prefix .so with lib and then rename them at install time, but pgxs 8.4 build them directly without prefix. I'm just speculating this is the issue and it is the only one I still have to solve... but... what's going to happen next release? Wouldn't it be better if make install could install stuff where I ask so I could put modules in different places *even* for the same installation of postgres? FWIW the soon-to-be-released PostGIS 1.5 has an out of place regression testing feature that allows PostGIS to be built using PGXS and regression tested without putting anything in the PostgreSQL installation directory itself. It's a little bit of a hack, but take a look at the PGXS makefile and the regression makefile to see how it all works: http://trac.osgeo.org/postgis/browser/trunk/postgis/Makefile.in http://trac.osgeo.org/postgis/browser/trunk/regress/Makefile.in HTH, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [CFReview] Red-Black Tree
)) as a50 into arr50 from generate_series(1,10) b; test=# create index arr50_idx on arr50 using gin (a50); CREATE INDEX Time: 3306.074 ms iv) len=random select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,10) b; test=# create index arrrand_idx on arrrand using gin (arand); CREATE INDEX Time: 4725.556 ms After patch: i) len=3 select ARRAY(select generate_series(1,3)) as a3 into arr3 from generate_series(1,10) b; test=# create index arr3_idx on arr3 using gin (a3); CREATE INDEX Time: 299.090 ms ii) len=30 select ARRAY(select generate_series(1,30)) as a30 into arr30 from generate_series(1,10) b; test=# create index arr30_idx on arr30 using gin (a30); CREATE INDEX Time: 2828.424 ms iii) len=50 select ARRAY(select generate_series(1,50)) as a50 into arr50 from generate_series(1,10) b; test=# create index arr50_idx on arr50 using gin (a50); CREATE INDEX Time: 3984.456 ms iv) len=random select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,10) b; test=# create index arrrand_idx on arrrand using gin (arand); CREATE INDEX Time: 3514.972 ms Summary === I believe Robert has done a good deal of the groundwork required to get this patch ready for inclusion. With the current version, I was able to see a measurable performance improvement in some test cases with no significant performance regression. It would have been nice to be able to reproduce the whole set of test cases but this was not possible from the information specified. With perhaps some minor tweaks to some of the names and a rework of the else clause in ginInsertEntry(), I feel this patch is reasonably close to commit. I shall now continue my review of the associated KNNGiST patch. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c index 954884d..e043713 100644 --- a/src/backend/access/gin/ginbulk.c +++ b/src/backend/access/gin/ginbulk.c @@ -22,15 +22,60 @@ #define DEF_NENTRY 2048 #define DEF_NPTR 4 +static void* +ginAppendData(void *old, void *new, void *arg) +{ + EntryAccumulator*eo = (EntryAccumulator*)old, + *en = (EntryAccumulator*)new; + + BuildAccumulator*accum = (BuildAccumulator*)arg; + + if (eo-number = eo-length) + { + accum-allocatedMemory -= GetMemoryChunkSpace(eo-list); + eo-length *= 2; + eo-list = (ItemPointerData *) repalloc(eo-list, + sizeof(ItemPointerData) * eo-length); + accum-allocatedMemory += GetMemoryChunkSpace(eo-list); + } + + /* If item pointers are not ordered, they will need to be sorted. */ + if (eo-shouldSort == FALSE) + { + int res; + + res = compareItemPointers(eo-list + eo-number - 1, en-list); + Assert(res != 0); + + if (res 0) + eo-shouldSort = TRUE; + } + + eo-list[eo-number] = en-list[0]; + eo-number++; + + return old; +} + +static int +cmpEntryAccumulator(const void *a, const void *b, void *arg) +{ + EntryAccumulator*ea = (EntryAccumulator*)a; + EntryAccumulator*eb = (EntryAccumulator*)b; + BuildAccumulator*accum = (BuildAccumulator*)arg; + + return compareAttEntries(accum-ginstate, ea-attnum, ea-value, +eb-attnum, eb-value); +} + void ginInitBA(BuildAccumulator *accum) { - accum-maxdepth = 1; - accum-stackpos = 0; - accum-entries = NULL; - accum-stack = NULL; accum-allocatedMemory = 0; accum-entryallocator = NULL; + accum-tree = rb_create(cmpEntryAccumulator, ginAppendData, NULL, accum); + accum-iterator = NULL; + accum-tmpList = NULL; } static EntryAccumulator * @@ -48,36 +93,6 @@ EAAllocate(BuildAccumulator *accum) } /* - * Stores heap item pointer. For robust, it checks that - * item pointer are ordered - */ -static void -ginInsertData(BuildAccumulator *accum, EntryAccumulator *entry, ItemPointer heapptr) -{ - if (entry-number = entry-length) - { - accum-allocatedMemory -= GetMemoryChunkSpace(entry-list); - entry-length *= 2; - entry-list = (ItemPointerData *) repalloc(entry-list, - sizeof(ItemPointerData) * entry-length); - accum-allocatedMemory += GetMemoryChunkSpace(entry-list); - } - - if (entry
Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS
Tom Lane wrote: Why do DOCS still go into doc/contrib? Shouldn't that become doc/$MODULEDIR for consistency? Hmmm it looks as if the code was correct but I missed the comment at the top of the file. Sorry for the confusion - revised version attached. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index a83dad3..76b585f 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -19,11 +19,14 @@ # # MODULES -- list of shared objects to be build from source file with # same stem (do not include suffix in this list) -# DATA -- random files to install into $PREFIX/share/contrib -# DATA_built -- random files to install into $PREFIX/share/contrib, +# MODULEDIR -- subdirectory under contrib into which DATA and DOCS are +# installed (if not set, DATA and DOCS files are installed directly +# into the contrib/ directory) +# DATA -- random files to install into $PREFIX/share/$MODULEDIR +# DATA_built -- random files to install into $PREFIX/share/$MODULEDIR, # which need to be built first # DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data -# DOCS -- random files to install under $PREFIX/doc/contrib +# DOCS -- random files to install under $PREFIX/doc/$MODULEDIR # SCRIPTS -- script files (not binaries) to install into $PREFIX/bin # SCRIPTS_built -- script files (not binaries) to install into $PREFIX/bin, # which need to be built first @@ -86,12 +89,19 @@ include $(top_srcdir)/src/Makefile.shlib all: all-lib endif # MODULE_big +ifndef MODULEDIR +datamoduledir = contrib +docmoduledir = contrib +else +datamoduledir = $(MODULEDIR) +docmoduledir = $(MODULEDIR) +endif install: all installdirs ifneq (,$(DATA)$(DATA_built)) @for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ done endif # DATA ifneq (,$(DATA_TSEARCH)) @@ -109,8 +119,8 @@ endif # MODULES ifdef DOCS ifdef docdir @for file in $(addprefix $(srcdir)/, $(DOCS)); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \ done endif # docdir endif # DOCS @@ -137,7 +147,7 @@ endif # MODULE_big installdirs: ifneq (,$(DATA)$(DATA_built)) - $(MKDIR_P) '$(DESTDIR)$(datadir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(datadir)/$(datamoduledir)' endif ifneq (,$(DATA_TSEARCH)) $(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data' @@ -147,7 +157,7 @@ ifneq (,$(MODULES)) endif ifdef DOCS ifdef docdir - $(MKDIR_P) '$(DESTDIR)$(docdir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(docdir)/$(docmoduledir)' endif # docdir endif # DOCS ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built)) @@ -161,7 +171,7 @@ endif # MODULE_big uninstall: ifneq (,$(DATA)$(DATA_built)) - rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built))) + rm -f $(addprefix '$(DESTDIR)$(datadir)'/$(datamoduledir)/, $(notdir $(DATA) $(DATA_built))) endif ifneq (,$(DATA_TSEARCH)) rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH))) @@ -170,7 +180,7 @@ ifdef MODULES rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES))) endif ifdef DOCS - rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS)) + rm -f $(addprefix '$(DESTDIR)$(docdir)'/$(docmoduledir)/, $(DOCS)) endif ifdef PROGRAM rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS
Tom Lane wrote: If you can set it up in such a way that the default behavior doesn't change, this would be workable. I don't think we want people to suddenly find their stuff installing in the wrong place. It probably wouldn't be that hard, something along the lines of ifndef MODULEDIR MODULEDIR=contrib endif ought to do it no? Yeah, that was pretty much along the lines of what I was thinking. Please find the revised v2 patch attached for comment. The one thing I have done is separated out the moduledir variable into datamoduledir and docmoduledir so there is a little bit of wiggle room if someone needs to install DATA items and DOCS items in different locations. I did have a brief look at seeing whether it would be possible to use this instead of DATA_TSEARCH, however this won't work because the DATA and DATA_TSEARCH targets need their files installed in two separate locations. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index a83dad3..feb7fc2 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -19,8 +19,11 @@ # # MODULES -- list of shared objects to be build from source file with # same stem (do not include suffix in this list) -# DATA -- random files to install into $PREFIX/share/contrib -# DATA_built -- random files to install into $PREFIX/share/contrib, +# MODULEDIR -- subdirectory under contrib into which DATA and DOCS are +# installed (if not set, DATA and DOCS files are installed directly +# into the contrib/ directory) +# DATA -- random files to install into $PREFIX/share/$MODULEDIR +# DATA_built -- random files to install into $PREFIX/share/$MODULEDIR, # which need to be built first # DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data # DOCS -- random files to install under $PREFIX/doc/contrib @@ -86,12 +89,20 @@ include $(top_srcdir)/src/Makefile.shlib all: all-lib endif # MODULE_big +ifndef MODULEDIR +datamoduledir = contrib +docmoduledir = contrib +else +datamoduledir = $(MODULEDIR) +docmoduledir = $(MODULEDIR) +endif + install: all installdirs ifneq (,$(DATA)$(DATA_built)) @for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \ done endif # DATA ifneq (,$(DATA_TSEARCH)) @@ -109,8 +120,8 @@ endif # MODULES ifdef DOCS ifdef docdir @for file in $(addprefix $(srcdir)/, $(DOCS)); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \ done endif # docdir endif # DOCS @@ -137,7 +148,7 @@ endif # MODULE_big installdirs: ifneq (,$(DATA)$(DATA_built)) - $(MKDIR_P) '$(DESTDIR)$(datadir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(datadir)/$(datamoduledir)' endif ifneq (,$(DATA_TSEARCH)) $(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data' @@ -147,7 +158,7 @@ ifneq (,$(MODULES)) endif ifdef DOCS ifdef docdir - $(MKDIR_P) '$(DESTDIR)$(docdir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(docdir)/$(docmoduledir)' endif # docdir endif # DOCS ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built)) @@ -161,7 +172,7 @@ endif # MODULE_big uninstall: ifneq (,$(DATA)$(DATA_built)) - rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built))) + rm -f $(addprefix '$(DESTDIR)$(datadir)'/$(datamoduledir)/, $(notdir $(DATA) $(DATA_built))) endif ifneq (,$(DATA_TSEARCH)) rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH))) @@ -170,7 +181,7 @@ ifdef MODULES rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES))) endif ifdef DOCS - rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS)) + rm -f $(addprefix '$(DESTDIR)$(docdir)'/$(docmoduledir)/, $(DOCS)) endif ifdef PROGRAM rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS
Alvaro Herrera wrote: The attached patch is a prototype which allows the user to specify a new MODULEDIR variable in a module makefile which, if specified, will install DATA and DOCS items in contrib/$(MODULEDIR) rather than just contrib. If MODULEDIR is left unspecified, the files will simply be stored directly in contrib/ as before. As a proof of its usefulness, you could remove DATA_TSEARCH and replace it with usage of MODULEDIR, right? Not in its current form because PGXS always places files underneath a contrib/ subdirectory within datadir. However, if people are happier with this approach then it shouldn't be too hard to alter things so that my PGXS Makefile would look like this: MODULE_big=postgis-1.5 MODULEDIR=contrib/$(MODULE_big) Once in this form it should then be possible to use this code to replace the DATA_TSEARCH variable that is currently in place. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add subdirectory support for DATA/DOCS with PGXS
Hi all, Since moving PostGIS over to PGXS for the 1.4 release, we've been looking at how we can support multiple versions of PostGIS being installed in different databases within the same cluster. We are able to version the .so file produced by PGXS without too much difficulty, however PGXS in its current form does not have the ability to install different versions of the DATA or DOCS files within the same installation. The attached patch is a prototype which allows the user to specify a new MODULEDIR variable in a module makefile which, if specified, will install DATA and DOCS items in contrib/$(MODULEDIR) rather than just contrib. If MODULEDIR is left unspecified, the files will simply be stored directly in contrib/ as before. In my current development setup here, the head of the new PostGIS PGXS Makefile now looks like this: MODULE_big=postgis-1.5 MODULEDIR=$(MODULE_big) ... ... With this patch in place, make install on the PGXS will correctly install the DATA and DOCS files in versioned directories and therefore allow multiple installations within the same database cluster. The coding within the Makefile isn't too difficult in its current form, but I'd be interested to get some initial feedback as to whether the introduction of a new MODULEDIR variable is the best way to add this new piece of functionality. Many thanks, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk index a83dad3..4e17e8a 100644 --- a/src/makefiles/pgxs.mk +++ b/src/makefiles/pgxs.mk @@ -19,8 +19,11 @@ # # MODULES -- list of shared objects to be build from source file with # same stem (do not include suffix in this list) -# DATA -- random files to install into $PREFIX/share/contrib -# DATA_built -- random files to install into $PREFIX/share/contrib, +# MODULEDIR -- subdirectory under contrib into which DATA and DOCS are +# installed (if not set, DATA and DOCS files are installed directly +# under the contrib/ directory) +# DATA -- random files to install into $PREFIX/share/contrib/$MODULEDIR +# DATA_built -- random files to install into $PREFIX/share/contrib/$MODULEDIR, # which need to be built first # DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data # DOCS -- random files to install under $PREFIX/doc/contrib @@ -86,12 +89,16 @@ include $(top_srcdir)/src/Makefile.shlib all: all-lib endif # MODULE_big +ifdef MODULEDIR +moduledir = /$(MODULEDIR) +endif + install: all installdirs ifneq (,$(DATA)$(DATA_built)) @for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib$(moduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib$(moduledir)'; \ done endif # DATA ifneq (,$(DATA_TSEARCH)) @@ -109,8 +116,8 @@ endif # MODULES ifdef DOCS ifdef docdir @for file in $(addprefix $(srcdir)/, $(DOCS)); do \ - echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ - $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \ + echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib$(moduledir)'; \ + $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib$(moduledir)'; \ done endif # docdir endif # DOCS @@ -137,7 +144,7 @@ endif # MODULE_big installdirs: ifneq (,$(DATA)$(DATA_built)) - $(MKDIR_P) '$(DESTDIR)$(datadir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(datadir)/contrib$(moduledir)' endif ifneq (,$(DATA_TSEARCH)) $(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data' @@ -147,7 +154,7 @@ ifneq (,$(MODULES)) endif ifdef DOCS ifdef docdir - $(MKDIR_P) '$(DESTDIR)$(docdir)/contrib' + $(MKDIR_P) '$(DESTDIR)$(docdir)/contrib$(moduledir)' endif # docdir endif # DOCS ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built)) @@ -161,7 +168,7 @@ endif # MODULE_big uninstall: ifneq (,$(DATA)$(DATA_built)) - rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built))) + rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib$(moduledir)/, $(notdir $(DATA) $(DATA_built))) endif ifneq (,$(DATA_TSEARCH)) rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH))) @@ -170,7 +177,7 @@ ifdef MODULES rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES))) endif ifdef DOCS - rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS)) + rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib$(moduledir)/, $(DOCS)) endif ifdef PROGRAM rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)' -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS
Tom Lane wrote: Huh. As far as I can see this example should traverse the same code path. I was about to ask for the dataset, but I think you might have already sent it to me once --- does this look familiar? $ tar tvfj geography.tar.bz2 -rw-r--r-- shade/shade 6444737 2008-06-06 13:33 geography.dbf -rw-r--r-- shade/shade 37179008 2008-06-06 13:33 geography.shp -rw-r--r-- shade/shade 263140 2008-06-06 13:33 geography.shx If so, what do I do with it exactly --- the file extensions convey nothing to my mind at the moment ... Okay. I've gone back and had a look at the original queries, and it seems the reason for the inflated times is that my setup for the original database was based on PostGIS 1.3, which has a RECHECK applied to the operator. If I temporarily remove the RECHECK from PostGIS 1.3 then the times drop substantially: postgis13=# explain analyze select count(*) from geography where type='Z' and centroid (select the_geom from geography where id=69495); QUERY PLAN - Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=394.183..394.184 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432) (actual time=18.192..41.855 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=46.711..345.940 rows=29687 loops=1) Index Cond: (centroid $0) Filter: ((type)::text = 'Z'::text) Total runtime: 394.265 ms (8 rows) Incidentally, the recently-released PostGIS 1.4 has RECHECK disabled by default, and so it can be seen that the times are reasonably similar: postgis14=# explain analyze select count(*) from geography where type='Z' and centroid (select the_geom from geography where id=69495); QUERY PLAN - Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=396.314..396.315 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439) (actual time=14.198..37.340 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=42.169..344.337 rows=29687 loops=1) Index Cond: (centroid $0) Filter: ((type)::text = 'Z'::text) Total runtime: 396.375 ms (8 rows) If I re-apply your patch to PostgreSQL 8.4 using PostGIS 1.4 (ignoring RECHECK) then the results now look like this: postgis14=# explain analyze select count(*) from geography where type='Z' and centroid (select the_geom from geography where id=69495); QUERY PLAN - Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=271.360..271.362 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439) (actual time=17.534..32.009 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=32.393..165.057 rows=29687 loops=1) Index Cond: (centroid $0) Filter: ((type)::text = 'Z'::text) Total runtime: 271.428 ms (8 rows) postgis14=# explain analyze select count(*) from geography where type='Z' and centroid (select force_2d(the_geom) from geography where id=69495); QUERY PLAN - Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=272.091..272.092 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439) (actual time=18.644..42.680 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=43.079..172.788 rows=29687 loops=1) Index Cond: (centroid $0) Filter: ((type)::text = 'Z'::text) Total runtime: 272.185 ms (8 rows) So in conclusion, I think that patch looks good and that the extra time I was seeing was due to RECHECK being applied to the operator, and not the time being spent within the index scan itself. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- Sent via pgsql-hackers mailing list (pgsql-hackers
Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS
Tom Lane wrote: There was recently another go-round on the postgis-devel list about the same problem Mark Cave-Ayland complained about last year: http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php Basically, what is happening is a nestloop join where the inner indexscan gets a comparison argument from the outer table, and that argument is toasted. Every single call of an index support function detoasts the argument again :-(, leading to upwards of 90% of the runtime being spent in detoasting. I made a proposal to fix it http://archives.postgresql.org/pgsql-hackers/2008-06/msg00709.php but that failed due to excessive ambition --- the cost/benefit/risk tradeoffs just weren't good enough. Thinking about it again, it seems to me that a much narrower patch could solve the specific forms of the problem that the PostGIS folk are seeing. Instead of trying to have a general-purpose method of preventing repeat de-toasting, we could just prevent it for inner indexscans by having ExecIndexEvalRuntimeKeys() detoast anything it's passing to the index AM. The attached patch accomplishes this with a net addition of about three statements. (It looks bigger, because I had to move a hunk of code to have the datatype info available when needed.) Paul Ramsey reports that this fixes the problem for him: http://postgis.refractions.net/pipermail/postgis-devel/2009-August/006659.html The only downside I can see offhand is that it will detoast short-header values that might not actually need to be detoasted. But we don't have any very good way to know whether a datatype's index support functions use PG_DETOAST_DATUM or PG_DETOAST_DATUM_PACKED. In the former case we do need to detoast short-header values. The extra overhead in this case amounts to only one palloc and a fairly short memcpy, which should be pretty negligible in comparison to the other setup costs of an indexscan, so I'm not very concerned about it. Comments? Better ideas? regards, tom lane Hi Tom, Thanks for the patch. Fortunately enough I was able to find the dataset from the original report above, and so I've tested your patch against PostgreSQL 8.4. Unfortunately in the original test case, it doesn't seem to give the same performance improvement for me that Paul was seeing :( postgis13=# \d geography Table public.geography Column | Type |Modifiers ++- gid| integer| not null default nextval('geography_gid_seq'::regclass) type | character varying(1) | id | numeric(20,0) | name | character varying(32) | population | numeric(20,0) | abbreviati | character varying(2) | po_name| character varying(100) | id_geo_sta | numeric(20,0) | the_geom | geometry | centroid | geometry | Indexes: geography_pkey PRIMARY KEY, btree (gid) idx_geography_centroid_z gist (centroid) Check constraints: enforce_dims_the_geom CHECK (ndims(the_geom) = 2) enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'MULTIPOLYGON'::text OR the_geom IS NULL) enforce_srid_the_geom CHECK (srid(the_geom) = (-1)) PostgreSQL 8.4 vanilla: postgis13=# explain analyze select count(*) from geography where type='Z' and centroid (select the_geom from geography where id=69495); QUERY PLAN -- Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=1770.333..1770.334 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432) (actual time=19.529..43.893 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=48.897..1730.004 rows=29687 loops=1) Index Cond: (centroid $0) Filter: ((type)::text = 'Z'::text) Total runtime: 1770.417 ms (8 rows) PostgreSQL 8.4 with your patch applied: postgis13=# explain analyze select count(*) from geography where type='Z' and centroid (select the_geom from geography where id=69495); QUERY PLAN -- Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual time=1499.414..1499.416 rows=1 loops=1) InitPlan 1 (returns $0) - Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432) (actual time=18.901..42.648 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using idx_geography_centroid_z on geography (cost=0.00..8.28 rows=1 width=0) (actual time=43.133..1456.944 rows=29687 loops
Re: [HACKERS] Proper entry of polygon type data
Peter Willis wrote: I am aware of PostGIS and already use it. My question was regarding the entry format of PostgreSQL polygon data. There is a void in the PostgreSQL documentation regarding this. Incidentally, PostGIS uses PostgreSQL polygon, point, and path data types. Errr... no it doesn't. PostGIS uses its own internal types to represent all the different geometries, although it does provide a cast between the existing PostgreSQL types as an aid for people wishing to migrate. Using PostGIS for simple , non-geographic, polygon rules is a bit like using a tank to kill a mosquito. Interesting metaphor... ;) ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] mingw check hung
Magnus Hagander wrote: Per discussion I looked at just reverting that part, but that won't work. If we do that, the call to SetEnvironmentVariable() will not be run, which certainly isn't right.. The problem has to be in win32env.c. I originally thought we accidentally called the putenv function twice in this case, but that code seems properly #ifdef:ed to MSVC. I'm not sure I trust the crash point at all - is this compiled with debug info enabled? It seems like a *very* strange line to crash on... I can't spot the error right off :-( Can you try to see if it's the putenv() or the unsetenv() that gets broken? (by making sure just one of them get replaced) //Magnus Hi guys, Don't know if this is relevant at all, but it reminds me of a problem I had with environment variables in PostGIS with MingW. It was something along the lines of environment variables set in a MingW program using putenv() for PGPORT, PGHOST etc. weren't visible to a MSVC-compiled libpq but were to a MingW-compiled libpq. It's fairly easy to knock up a quick test program in C to verify this. I eventually gave up and just built a connection string instead - for reference the final patch is here http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000199.html. I appreciate it may not be 100% relevant, but I thought I'd flag it up as possibly being a fault with the MingW putenv implementation. HTH, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [postgis-devel] CLUSTER in 8.3 on GIST indexes break
Hi Kevin, Yeah I see exactly the same problem on 8.3.5 too, although it seems random - what seems to happen is that sometimes the contents of the temporary table disappears. I've attached a test script which causes the error *some* of the time, although it tends to occur more often just after the server has been restarted. I've been invoking the attached script against a PostgreSQL 8.3.5/PostGIS 1.3.4 installation, and when the bug hits I see the following psql output against a freshly restarted server: postgis13=# \i /tmp/postgis-strange.sql SELECT CREATE INDEX ANALYZE count --- 1 (1 row) CLUSTER ANALYZE count --- 1 (1 row) postgis13=# \i /tmp/postgis-strange.sql psql:/tmp/postgis-strange.sql:2: ERROR: relation tmp already exists psql:/tmp/postgis-strange.sql:3: ERROR: relation tmp_geom_idx already exists ANALYZE count --- 1 (1 row) CLUSTER ANALYZE count --- 0 (1 row) postgis13=# \i /tmp/postgis-strange.sql psql:/tmp/postgis-strange.sql:2: ERROR: relation tmp already exists psql:/tmp/postgis-strange.sql:3: ERROR: relation tmp_geom_idx already exists ANALYZE count --- 0 (1 row) CLUSTER ANALYZE count --- 0 (1 row) So in other words, the contents of the temporary table has just disappeared :( ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Count script create temp table tmp as select st_makepoint(random(), random()) as the_geom from generate_series(1, 1); create index tmp_geom_idx on tmp using gist (the_geom); analyze tmp; select count(*) from tmp; cluster tmp using tmp_geom_idx; analyze tmp; select count(*) from tmp; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [postgis-devel] CLUSTER in 8.3 on GIST indexes break
Gregory Stark wrote: Uhm. That rather sucks. I was able to reproduce it too. It seems to happen after I pause for a bit, and not when I run the script in fast succession. Thanks for the verification Greg. I'm wondering if the GiST part is a red herring, and in fact it is related to some bizarre interaction between CLUSTER/VACUUM/autovacuum? ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch status for reducing de-TOAST overhead?
Hi everyone, I'm just following up on the patch created here by Tom to aid with repeated de-TOASTing attempts: http://archives.postgresql.org/pgsql-hackers/2008-06/msg01096.php Given the performance report from Jeff (http://archives.postgresql.org/pgsql-hackers/2008-08/msg01178.php), is there still scope for getting something like this in 8.4 - or this patch still far shy of anything ready for a commit fest? I have a feeling that working on this is something currently beyond my own capability :( ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Missing results from scroll cursor in PostgreSQL 8.3.3?
Hi there, Following up a report on the PostGIS bugtracker (also submitted to pgsql-bugs here: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00086.php), I'm wondering if there is a bug in the way that GiST indexes interact with scroll cursors. I've managed to reproduce the bug using btree_gist rather than PostGIS and have attached a test case for review. The key point is that if a GiST index is used to pull results from a scroll cursor then FETCH ABSOLUTE X fails to return any rows. I'm wondering if it could be related to the fact the GiST indexes are not ordered? Perhaps the only thing that is wrong is that a suitable ERROR message is missing? ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- -- Initial data -- CREATE TABLE ctest ( id int8, name varchar ); INSERT INTO ctest (id, name) SELECT id, 'Test' || id FROM generate_series(1, 1000) AS id; CREATE INDEX ctest_id_idx ON ctest(id); -- -- Return absolute cursor records using sequential scan index -- BEGIN; SET enable_seqscan = 't'; DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990; FETCH ABSOLUTE -1 IN CUR; FETCH ABSOLUTE 3 IN CUR; CLOSE CUR; SET enable_seqscan = 'f'; DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990; FETCH ABSOLUTE -1 IN CUR; FETCH ABSOLUTE 3 IN CUR; CLOSE CUR; COMMIT; -- -- Rebuild with btree_gist -- DROP INDEX ctest_id_idx; CREATE INDEX ctest_id_gist_idx ON ctest USING gist(id gist_int8_ops); -- -- Now try again... but this time no results are returned using GiST index scan? -- BEGIN; SET enable_seqscan = 't'; DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990::bigint; FETCH ABSOLUTE -1 IN CUR; FETCH ABSOLUTE 3 IN CUR; CLOSE CUR; SET enable_seqscan = 'f'; DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990::bigint; FETCH ABSOLUTE -1 IN CUR; FETCH ABSOLUTE 3 IN CUR; CLOSE CUR; COMMIT; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing
Tom Lane wrote: OK, I've reproduced the test case locally. I believe that when you say worse, you mean worse than 8.3, right? And you did tell me offlist that you were testing with --enable-cassert. CVS HEAD has very substantially greater cassert overhead because of the randomize_memory addition --- oprofile output for this test looks like samples %image name symbol name 1239580 78.7721 postgres randomize_mem 1435449.1218 libc-2.7.so memcpy 48039 3.0528 libc-2.7.so memset 13838 0.8794 postgres LWLockAcquire 12176 0.7738 postgres index_getnext 11697 0.7433 postgres LWLockRelease 10406 0.6613 postgres hash_search_with_hash_value 4739 0.3012 postgres toast_fetch_datum 4099 0.2605 postgres _bt_checkkeys 3905 0.2482 postgres AllocSetAlloc 3751 0.2384 postgres PinBuffer 3545 0.2253 postgres UnpinBuffer I'm inclined to think that we'd better turn that off by default, since it's not looking like it's catching anything new. Yes, I suspect that's probably it. I applied the patch straight to CVS tip as I wasn't aware of any changes that would affect the unpatched result, but I was obviously wrong ;) (cut) On the whole I'm still feeling pretty discouraged about this patch ... At the very least we have some more information on how an eventual solution should work, and a test case to help analyse the effectiveness of any potential solution. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing
Gregory Stark wrote: Well at least it caught the bug that Mark was performance testing with a --enable-cassert build :/ True ;) I appreciated that there would be some overhead, but I didn't think it would be that much. This was mainly since I seem to remember there was talk a while back of enabling some assertions in production builds. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing
Tom Lane wrote: Attached is a worked-out patch for the approach proposed here: http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php namely, that cache management for de-TOASTed datums is handled by TupleTableSlots. To avoid premature detoasting of values that we might never need, the patch introduces a concept of an indirect TOAST pointer, which has the same 0x80 or 0x01 header as an external TOAST pointer, but can be told apart by having a different length byte. Within that we have * pointer to original toasted field within the Slot's tuple * pointer to the owning Slot * pointer to decompressed copy, or NULL if not decompressed yet Some fairly straightforward extensions to the TupleTableSlot code, heaptuple.c, and tuptoaster.c make it all go. My original thoughts had included turning FREE_IF_COPY() into a no-op, but on investigation that seems impractical. One case that still depends on that pfree is where we have palloc'd a 4-byte-header copy of a short-header datum to support code that needs properly aligned datum content. The solution adopted in the patch is to arrange for pfree() applied to a cacheable detoasted object to be a no-op, whereas it still works normally for non-cached detoasted objects. We do this by inserting a dummy chunk header that points to a dummy memory context whose pfree support method does nothing. I think this part of the patch would be required for any toast caching method, not just this one. What I like about this patch is that it's a fairly small-footprint change, it doesn't add much overhead, and it covers caching of decompression for in-line-compressed datums as well as the out-of-line case. One thing I really *don't* like about it is that it requires everyplace that copies Datums to know about indirect pointers: in general, the copy must be a copy of the original toasted Datum, not of the indirect pointer, else we have indirect pointers that can outlive their owning TupleTableSlot (or at least outlive its current tuple cycle). There only seem to be about half a dozen such places in the current code, but still it seems a rather fragile coding rule. After playing with it for a little bit, I'm not convinced that it buys enough performance win to be worth applying --- the restriction of cache lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive. (For example, sorts that involve toasted sort keys continue to suck, because the tuples being sorted aren't in Slots.) It would probably fix the specific case that the PostGIS hackers were complaining of, but I think we need something more. Still, I wanted to get it into the archives because the idea of indirect toast pointers might be useful for something else. Hi Tom, Thanks very much for supplying the WIP patch. In the interest of testing and feedback, I've just downloaded PostgreSQL CVS head and applied your patch, compiled up a slightly modified version of PostGIS (without RECHECKs on the GiST opclass) and loaded in the same dataset. Unfortunately I have to report back that with your WIP patch applied, timings seem to have become several orders of magnitude *worse*: [EMAIL PROTECTED]:~$ psql -d postgis psql (8.4devel) Type help for help. postgis=# explain analyze select count(*) from geography where centroid (select the_geom from geography where id=69495); QUERY PLAN -- Aggregate (cost=7100.28..7100.29 rows=1 width=0) (actual time=18238.932..18238.934 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=27.472..69.223 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using geography_centroid_idx on geography (cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041 rows=32880 loops=1) Index Cond: (centroid $0) Total runtime: 18239.918 ms (7 rows) In fact, even sequential scans seem to have gone up by several orders of magnitude: postgis=# set enable_indexscan = 'f'; SET postgis=# set enable_bitmapscan = 'f'; SET postgis=# explain analyze select count(*) from geography where centroid (select the_geom from geography where id=69495); QUERY PLAN Aggregate (cost=14184.01..14184.01 rows=1 width=0) (actual time=9117.022..9117.024 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387) (actual time=23.780..54.362 rows=1 loops=1) Filter: (id = 69495::numeric) - Seq Scan on geography (cost=0.00..7092.00 rows=1 width=0) (actual time=55.016..9073.084 rows=32880 loops=1) Filter: (centroid $0) Total runtime: 9117.174 ms (7 rows) ATB, Mark. -- Mark Cave
Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing
Guillaume Smet wrote: From the discussion we had a few months ago, I don't think the no RECHECK trick works with CVS tip anymore. See my post on the Remove lossy-operator RECHECK flag? thread: http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php and follow-ups. That said, perhaps it's not the only problem you have but I thought it was worth mentioning. Yeah sorry, that might not have been as clear as I hoped. What I meant was that I removed the explicit RECHECK clause from the GiST operator class definition - since as you correctly mention, CVS tip throws an error if this is present. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issue with GiST index scan taking far too long
Tom Lane wrote: Well, yeah, because the first thing it does is pg_detoast_datum. Just as a cross-check, try changing it to copy the value without forcibly detoasting --- I'll bet it's still slow then. Yeah, that appears to be exactly the case. After some grepping through various header files I came up with this alternative: PG_FUNCTION_INFO_V1(LWGEOM_mcatest); Datum LWGEOM_mcatest(PG_FUNCTION_ARGS) { struct varlena *pgl = PG_GETARG_RAW_VARLENA_P(0); void *mem; // Copy somewhere else mem = palloc(VARSIZE_EXTERNAL(pgl)); memcpy(mem, pgl, VARSIZE_EXTERNAL(pgl)); PG_RETURN_POINTER(mem); } ...and of course, this now takes the slower runtime of 2.5s. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issue with GiST index scan taking far too long
Tom Lane wrote: So you are saying it is de-toasted 32880 times, in this case? If not, where are the repeated de-toastings happening? Inside the index support functions. I'm thinking we could fix this by forcibly detoasting values passed as index scan keys, but it's not quite clear where's the best place to do that. Ouch. This is rapidly getting out of my sphere of knowledge, but I'd guess you'd want to do this either just before you start the index scan, or cache the results within the AM after the first deTOASTing. In terms of PostGIS, we tend to do a lot of index queries against large geometries so we see cases like this frequently - so optimising them would be good. I did think of another idea though: at the moment all members of the GiST opclass for geometry objects are declared using the geometry type (which contains the entire geometry), whereas individual entries are stored within the index as box2d objects representing just their bounding box. Would it make sense to rework the GiST routines so that instead of accepting geometry op geometry, they accept box2d op box2d? Then surely if we add a CAST from geometry to box2d then the geometry would get converted to its bounding box (which would not require deTOASTing) before being used as an index scan key. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Strange issue with GiST index scan taking far too long
Hi there, I'm currently looking at a bug report in PostGIS where we are getting extremely long index scan times querying an index in one case, but the same scan can take much less time if the input geometry is calculated as the result of another function. First of all, I include the EXPLAIN ANALYZE of the basic query which looks like this: postgis=# explain analyze select count(*) from geography where centroid (select the_geom from geography where id=69495); QUERY PLAN - Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual time=2691.783..2691.784 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) (actual time=60.987..61.373 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using geography_geom_centroid_idx on geography (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 rows=32880 loops=1) Index Cond: (centroid $0) Filter: (centroid $0) Total runtime: 2692.288 ms (8 rows) The only real thing to know about the query is that the id field within the geography table is a primary key, and hence only a single geometry is being returned from within the subselect. Note that most of the time is disappearing into the index scan. Where things start getting really strange is when we add an extra function called force_2d() into the mix. All this function does is scan through the single geometry returned from the subselect and remove any 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this query looks like this: postgis=# explain analyze select count(*) from geography where centroid (select force_2d(the_geom) from geography where id=69495); QUERY PLAN Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual time=343.004..343.005 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) (actual time=48.714..49.016 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using geography_geom_centroid_idx on geography (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 loops=1) Index Cond: (centroid $0) Filter: (centroid $0) Total runtime: 343.084 ms (8 rows) So by adding in an extra function around the subselect result, we have speeded up the index lookup by several orders of magnitude, and the speedup appears to be coming from somewhere within the index scan?! I've spent a little bit of time playing with this and it seems even writing a function to return a copy of the input function is enough. Here is my test function below: Datum LWGEOM_mcatest(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(LWGEOM_mcatest); Datum LWGEOM_mcatest(PG_FUNCTION_ARGS) { PG_LWGEOM *pgl = (PG_LWGEOM *) PG_DETOAST_DATUM(PG_GETARG_DATUM(0)); void *mem; /* Copy somewhere else */ mem = palloc(VARSIZE(pgl)); memcpy(mem, pgl, VARSIZE(pgl)-VARHDRSZ); PG_RETURN_POINTER(mem); } CREATE OR REPLACE FUNCTION mcatest(geometry) RETURNS geometry AS '$libdir/lwpostgis','LWGEOM_mcatest' LANGUAGE 'C'; And then here is the resulting EXPLAIN ANALYZE: postgis=# explain analyze select count(*) from geography where centroid (select mcatest(the_geom) from geography where id=69495); QUERY PLAN Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual time=283.126..283.127 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) (actual time=48.712..49.040 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using geography_geom_centroid_idx on geography (cost=0.00..8.28 rows=1 width=0) (actual time=49.321..215.524 rows=32880 loops=1) Index Cond: (centroid $0) Filter: (centroid $0) Total runtime: 283.221 ms (8 rows) Unfortunately I can't seem to work out why the extra time is disappearing into the index scan when my extra mcatest() function is not present, especially as sprof doesn't seem to want to run at the moment :( I'm wondering if it's related to either excess TOAST/palloc/pfree somewhere in the code, but I'd definitely appreciate any pointers. All these tests were done using PostgreSQL 8.3.1 and the latest PostGIS SVN. Many thanks, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription
Re: [HACKERS] Strange issue with GiST index scan taking far too long
Simon Riggs wrote: Hmmm, perhaps implicit casting? Try this to see if it works better also select count(*) from geography where centroid (select the_geom::geometry from geography where id=69495); Hi Simon, Unfortunately that seems to take the slow runtime path too. I did initially think about casting being involved (since the underlying index storage type is actually box2d rather than geometry), however my mcatest() function is also declared as returning geometry too. Interesting enough, forcing a cast to box2d instead of geometry seems to take the faster path, i.e: postgis=# explain analyze select count(*) from geography where centroid (select the_geom::box2d from geography where id=69495); QUERY PLAN Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual time=376.033..376.034 rows=1 loops=1) InitPlan - Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559) (actual time=42.853..43.051 rows=1 loops=1) Filter: (id = 69495::numeric) - Index Scan using geography_geom_centroid_idx on geography (cost=0.00..8.28 rows=1 width=0) (actual time=43.218..286.535 rows=32880 loops=1) Index Cond: (centroid ($0)::geometry) Filter: (centroid ($0)::geometry) Total runtime: 376.117 ms (8 rows) ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange issue with GiST index scan taking far too long
Tom Lane wrote: Is the value you are fetching from the geography table large enough to be toasted? I'm thinking you might be looking at the cost of repeated de-toastings. Yeah, it's a fairly large geometry field so it will definitely be getting toasted. So is it a case of with the mcatest function in place, we're effectively caching the de-TOASTED value? BTW, that mcatest function is buggy --- it's not copying the last 4 bytes of the source value. I don't know enough about PostGIS data types to know what effect that would have. It would probably lose that last point in the coordinate sequence, but nothing major. I've removed the -VARHDRSZ part just to check and it doesn't make any difference. Many thanks, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting Free Space Map
On Mon, 2008-03-17 at 13:23 -0400, Tom Lane wrote: I'm not wedded to forks, that's just the name that was used in the only previous example I've seen. Classic Mac had a resource fork and a data fork within each file. Don't think I like maps though, as (a) that prejudges what the alternate forks might be used for, and (b) the name fails to be inclusive of the data fork. Other suggestions anyone? I believe that in the world of NTFS the concept is called streams: http://support.microsoft.com/kb/105763. HTH, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 8.4 development plan
On Friday 08 February 2008 00:52:04 Gregory Stark wrote: Well not really. Our current model is that only stuff that's ready for widespread use goes into CVS. That means everything isn't open and shared at all. everything is mostly sitting on people's local hard drives where you can't use do anything with it, let alone contribute. The patches mailing list is basically our poor-man's distributed SCM today. It's awful since a) you never know if you're looking at the most recent version b) updating your tree from an old version to a newer version is painful c) people only release versions when they think they have something to say or a question; they don't know you want to try out their stuff until you complain about last month's silly bugs d) you never know what version of the tree the patch was against and of course e) if you make any changes they have all the same problems dealing with your changes to their patch. And it's hardly any more centralized than a distributed SCM system would be. One of the things I would like to see in the project is more modularisation during development . In particular, it may be useful to allow different maintainers to look after different parts of the backend, much in the way that different linux kernel developers are in charge of different subsystems. This strikes me as being advantageous in a couple of ways: i) It lowers the bar for entry into the project. Knowing the ins and outs of one subsystem is going to take a developer much less time than it does to learn about the entire backend. ii) Some of the larger patches we have seen during 8.3 would be broken into smaller chunks based upon the part of the backend they touch; so reviewing 3 or 4 smaller incremental patches across 3 or 4 people will take much less time than having to wait for someone like Alvaro or Tom to review and commit several hundred KB of code. This seems to fit more with the idea of a distributed SCM, although it wouldn't be entirely out of the question to set up permissions using CVS/SVN. ATB, Mark. -- Mark Cave-Ayland Sirius Corporation - The Open Source Experts http://www.siriusit.co.uk T: +44 870 608 0063 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Minor bug in src/port/rint.c
Hi everyone, I believe that there is a small bug in src/port/rint.c when the input parameter has a fractional part of 0.5 which is demonstrated by the attached program. It appears that the PG version of rint() rounds in the wrong direction with respect to glibc. [EMAIL PROTECTED]:~$ ./test rint(-1.5): -2.00 pg_rint(-1.5): -1.00 rint(1.5): 2.00 pg_rint(1.5): 1.00 The fix is, of course, to add an equals into the if() comparisons on lines 21 and 26, so that when the fractional part is 0.5, it rounds in the opposite direction instead. I'm sure that this will have practically zero effect on the code, however it may be worth applying for correctness and consistency with other platform implementations. ATB, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk /* * rint() test program * * Compile on gcc using: * gcc -o test test.c -lm */ #include math.h #include stdio.h double pg_rint(double x) { double f, n = 0.; f = modf(x, n); if (x 0.) { if (f .5) n += 1.; } else if (x 0.) { if (f -.5) n -= 1.; } return n; } int main() { printf(rint(-1.5): %f\n, rint(-1.5)); printf(pg_rint(-1.5): %f\n, pg_rint(-1.5)); printf(rint(1.5): %f\n, rint(1.5)); printf(pg_rint(1.5): %f\n, pg_rint(1.5)); } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Minor bug in src/port/rint.c
On Sun, 2008-01-20 at 16:47 -0500, Tom Lane wrote: Your proposed fix wouldn't make it act the same as glibc, only move the differences around. I believe glibc's default behavior for the ambiguous cases is round to nearest even number. You propose replacing round towards zero, which is what our code currently does, with round away from zero, which really isn't likely to match any platform's behavior. (The behaviors specified by IEEE are to nearest even, towards zero, towards minus infinity, and towards plus infinity, with the first being the typical default.) Considering that probably every modern platform has rint(), I doubt it's worth spending time on our stopgap version to try to make it fully IEEE-compliant ... Hi Tom, Right, I think I understand more about this now. My confusion stemmed from reading the GNU documentation here: http://www.gnu.org/software/libc/manual/html_node/Rounding-Functions.html where in rint() section it says The default rounding mode is to round to the nearest integer. However, Wikipedia proved to be quite a fruitful resource, and agrees with what you stated which was round to even number. Interestingly, the article on rounding also points to this page http://support.microsoft.com/kb/196652 which has some example code to perform round to even number, or Banker's Rounding. The reason I was looking into this is because PostGIS will require this for an MSVC Win32 build. I haven't yet tried the implementations given in the above page, so I'm not sure whether they are any good or not... The big question is, of course, how much difference does this make? Does the current implementation exhibit different behaviour for certain date calculations between Win32 and glibc platforms, and if so does it matter? I guess at the end of the day, if it doesn't have any real effect then there is no need to worry about it. ATB, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?
On Sat, 2007-12-08 at 17:09 -0500, Tom Lane wrote: So what I think we must do is split the function into two: PQconnectionNeedsPassword: true if server demanded a password and there was none to send (hence, can only be true for a failed connection) PQconnectionUsedPassword: true if server demanded a password and it was supplied from the connection function's argument list, *not* from environment or a password file. Offhand it looks like only dblink needs the second version --- all of our client-side code wants the first one. Barring objections I'll go code this up ... Yup, that looks good to me. My only criticism would be that the name PQconnectionUsedPassword could be a little misleading, in that based upon the name I would expect it to return true if a password was specified regardless of whether the method used was interactive, .pgpass or PGPASSWORD. The only other suggestion I can think of at the moment would be PQconnectionUsedConnectionPassword which seems to better sum up its purpose to me... Kind regards, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?
On Sat, 2007-12-08 at 10:09 -0500, Tom Lane wrote: My vote goes to (3), if the work can be done quickly, or (1) if it can't. Ditto. I don't think it's a big problem to do, as long as we are agreed on the behavior we want. In particular, consider: 1. If libpq obtains a password internally (ie, from PGPASSWORD or ~/.pgpass), and it's wrong, do we want a user password prompt? 2. If we prompt the user for a password, and it's wrong, do we want to try again? Our historical behavior on both points was no, but at least the second one seems a bit questionable. I'd say that you definitely don't want a user password prompt if libpq's password is wrong, since I can see this would break a lot of scripts that weren't launched directly from the shell - the switch from non-interactive to interactive would cause the script to hang in the background rather than return immediately with an error (indeed it was this confusing symptom in the PostGIS installer that flagged the change in behaviour). As for the second point, I'm not too worried about how many times you are asked for the password - I personally am happy with the one attempt as it stands at the moment. My main concern is the switch from a non-interactive mode to an interactive mode. Kind regards, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?
Hi everyone, I think that I may have found a minor bug with PostgreSQL 8.3beta4 with respect to md5 authentication. I actually discovered this on Win32, but it appears that the behaviour is the same under Linux too. As part of the PostGIS install under Win32, I have a few scripts that check for the existence of a particular database by doing the following: psql -d adatabase -h localhost -c SELECT version(); By checking the psql exit code, it is fairly easy to see whether this failed, and if so display the contents of stdout for the user. The problem I have is that under PostgreSQL 8.3beta4, if the database doesn't exist then I get an extra password prompt which breaks the install scripts as they run in the background :( To recreate this is fairly easy: 1. Temporarily rename any .pgpass files so they aren't found by libpq 2. Stop the PostgreSQL 8.3 server 3. Change pg_hba.conf so that local connections are disabled, but connections to 127.0.0.1 are allowed with md5 authentication 4. Restart the PostgreSQL server 5. Open up a shell and do the following: [EMAIL PROTECTED]:~$ export PGPASSWORD=mypass [EMAIL PROTECTED]:~$ psql -h localhost -d postgres -c SELECT version(); version --- PostgreSQL 8.3beta2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) (1 row) So far so good. But now try with a database that doesn't exist: [EMAIL PROTECTED]:~$ psql -h localhost -d doesntexist -c SELECT version(); Password: psql: FATAL: database doesntexist does not exist H. So even though PGPASSWORD is set (and the command works if the database exists within the cluster), if I specify a non-existent database then I still get prompted for a password. I've run the same test against PostgreSQL 8.2.5 and the test works in that I don't get prompted for a password the second time. So the behaviour has changed between versions, but I wanted to check that it wasn't a deliberate change before looking deeper. Many thanks, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?
On Fri, 2007-12-07 at 11:03 -0500, Tom Lane wrote: Hmmm ... it seems the problem is that we've defined PQconnectionUsedPassword in such a way that it returns true (causing a prompt) regardless of whether the reason for the connection failure was a bad password or not. We might need to reconsider that API. Right. I think it depends on the interpretation of the PQconnectionUsedPassword function. If it should simply return whether or not the connection used a password or not (as it does now), then you could argue that it should be psql which should incorporate an additional check to determine whether the attempt was cancelled due to an invalid database name. On first glance, libpq appears to return just CONNECTION_BAD and an error message, so I'm not sure whether we can easily detect the difference between an incorrect password and an invalid database name :( Is there an additional status code in PGconn that can be used to determine the exact cause of connection failure? ATB, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Locating sharedir in PostgreSQL on Windows
Hi everyone, I'm working on a piece of code for PostGIS to allow the loading of projection configuration files from the share/postgresql directory, but I can't find a way of getting this to work under Win32. AIUI the way to do this would be to use a combination of my_exec_path and get_share_path in order to find the directory, but MingW refuses to find the my_exec_path variable during linking. Unfortunately I suspect that this because my_exec_path is not declared as DLLIMPORT in backend/utils/init/globals.c :( I really need to find a solution to this that can work all the way back to PostgreSQL 8.0 - can anyone think of any better ideas? Many thanks, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Locating sharedir in PostgreSQL on Windows
On Mon, 2007-11-26 at 11:55 -0500, Tom Lane wrote: I'm not sure why Mark's having a problem accessing my_exec_path --- it *is* declared DLLIMPORT in miscadmin.h (which is where it counts, AIUI) clear back to 8.0. Bah, I think that is the source of the problem. Having grepped the source for my_exec_path, I found the reference in globals.c and hence the code worked on Linux (my main development environment) with a simple extern declaration which was what was confusing me. Adding the #include miscadmin.h solves this problem and I can now access the variable on Windows aswell, so I put this down to user error - thanks for the advice Tom. Many thanks, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Locating sharedir in PostgreSQL on Windows
On Mon, 2007-11-26 at 17:02 -0500, Tom Lane wrote: I believe that that is talking specifically about shared libraries (or DLLs in Windows-speak), and not about configuration or data files. In particular, nothing under libdir would be a candidate to go under sharedir, nor vice versa, since the former is supposed to hold architecture-dependent files and the latter architecture-independent files. Mark hasn't been very clear about whether he wants to store static data files or installation-changeable configuration info, so it's not clear to me whether Peter's objection to using sharedir is appropriate or not. But unless the files are architecture-sensitive (which they might be!), libdir doesn't sound right in either case. Okay, I'll try and expand on this a bit. In order to convert coordinates between different coordinate systems, PostGIS uses the external PROJ.4 library. Now in order to support a certain category of conversion, PROJ.4 requires access to a set of library grid reference files which are effectively compiled from source files into a set of data files as part of the build process. The path to this directory of files is then built into the DLL at compile time, although it can be overriden with an API call. Under Linux, this is fairly easy as the files are normally installed somewhere under /usr/share/proj, and hence the directory exists at both compile-time and run-time. Windows is trickier because drive letters and mappings can change - the default of C:\PROJ\NAD may or may not exist, or can change depending upon the current drive configuration. I can also see issues arising if the PostgreSQL installation is moved from the C:\ drive to another. Hence my idea was to create a directory under $sharedir such as $sharedir/postgresql/contrib/postgis/nad and install the files there. Then regardless of the location of the PostgreSQL installation or the current drive setup, I can use get_share_path() with the PROJ.4 API to set the new library path the first time the function is called, and everything will just work. I can see Peter's argument about not putting files directly in $sharedir, but I feel the usage suggested above falls under a similar use case to the tsearch2 data files (which is mostly where I looked for inspiration). Hopefully this will help make things a bit clearer - please let me know if any more information is needed. Many thanks, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] msvc and vista fun
On Sun, 2007-06-24 at 13:23 -0400, Andrew Dunstan wrote: (cut) On a somewhat related note, I have had spectacular lack of success in getting either MSVC or MinGW builds to work on Vista - so much so that I have currently abandoned my attempts on that platform and I resorted to resuscitating an old XP box for testing. Following some advice from Magnus, I added ACLs to the build root for both an admin and a non-admin user (cacls buildroot /E /T /G AdminUser:C and similarly for a non-admin user) . I can build as the admin user but when I come to run initdb it fails, complaining that it can't find the postgres executable. If I then switch to the non-admin user, it can run initdb just fine. However, that user can't build, because it gets a mysterious failure from mt.exe. MinGW is even worse - it says it can't run gcc because it can't run cc1.exe (IIRC), so it fails at the configure stage! All of this has cost me quite a few hours in the last week, with very little to show for it. Hi Andrew, This has come up quite recently on the MingW lists - see http://sourceforge.net/mailarchive/forum.php?thread_name=00fc01c7a0af% 240a037c30%240200a8c0%40AMD2500forum_name=mingw-users for a discussion of the problem and solution. Kind regards, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Hierarchical Queries--Status
On Thu, 2007-02-08 at 20:49 -0500, Bruce Momjian wrote: Who is working on this item? Jonah was trying to complete this for 8.3, but I believe that he has handed it onto Gregory Stark - I think http://archives.postgresql.org/pgsql-hackers/2007-01/msg01586.php is the latest update. Kind regards, Mark. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Function execution costs 'n all that
On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote: Brian Hurt [EMAIL PROTECTED] writes: Non-developer here, but we use a lot of plpgsql functions at work. And the functions we use fall into two broad, ill-defined catagories- expensive functions and cheap functions. What I'd like as a user is some way to tell the planner this function is expensive- prefer plans which call this function less even if they're otherwise more expensive or this function is cheap, prefer plans that are otherwise less expensive even if they call this function more often. Precise cost estimates aren't that important, IMHO. Right, so a plain constant cost would be plenty for your situation. I suspect there's an 80/20 rule at work here --- the estimator-function side of this will take most of the effort to design/implement, but not get used nearly as much as the plain-constant form ... maybe we should just do the constant for starters and see how many people really want to write C-code estimators ... regards, tom lane Hi Tom et al, Having worked with stored procedures on large datasets for reporting, I would say that it would be useful to have a non-constant estimator for the number of rows, whereas a single CPU cost constant should be fine. Where I have struggled with this has been joining onto slightly more exotic queries when doing large scale data processing as part of a custom report or an application upgrade. Using PL/PGSQL I would find it useful to have access to the constants passed into a function to be used to help provide a row count estimate (typically useful for passing in table/column names), e.g. SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id = t2.id; CREATE FUNCTION my_func(text) AS $$ ... $$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost; In my cost function, I could then estimate the number of rows using something like below, where all constants are passed into the cost function as parameters, e.g.: CREATE FUNCTION my_func_row_cost(text) AS $$ DECLARE foo bigint; BEGIN EXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1); RETURN foo; END; $$ LANGUAGE 'plpgsql'; In the case where a parameter was not a constant but a column name, then it would be reasonable in my mind to simply replace that parameter with NULL when passing to the row cost function, e.g. SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT my_func(my_table1.name)); In this case, the text parameter passed to my_func_row_cost would be replaced by NULL to indicate that it was non-constant. Of course, even with constants passed upon input, it still may not be possible to calculate a number of rows that can be returned - it could be the case that the only parameter passed to cost function has been converted to NULL because it is a column name. Perhaps in this case it would be useful to specify returning NULL from my_func_row_cost means I can't return anything meaningful, so use the fallback values. Kind regards, Mark. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] WITH support
On Wed, 2007-01-03 at 09:45 +0100, Hubert FONGARNAND wrote: Why not looking at http://gppl.moonbone.ru/ evgen potemkin. has ever made a patch for WITH and CONNECT BY? I'm ready to test these features... (RECURSIVE) when they'll land in CVS... Hi Hubert, IIRC there were two issues - firstly the license for the patch was GPL as opposed to BSD used for PostgreSQL, and secondly I believe Tom and some of the other developers were not happy with some of the internal changes made by the patch. Fortunately it looks as if Jonah can commit some serious time to this, so your offer of helping to test when the code hits CVS is gratefully received :) Kind regards, Mark. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] WITH support
On Sat, 2006-12-30 at 00:49 -0500, Jonah H. Harris wrote: On 12/29/06, Bruce Momjian [EMAIL PROTECTED] wrote: No code yet, and I don't remember who said they were working on it. I'm still waiting to hear from Mark Cave-Ayland on whether he's going to pick it up or whether I'll just do it. One way or another, there should be some movement regarding design discussion within a week or two. Hi everyone, Sorry to come in late on this discussion - I've been playing catch-up with emails over the festive season... The situation regarding WITH support is that it's something which I am picking up in my spare time (out of work hours) and so unfortunately progress isn't as rapid as I would like. See here for my last meaningful patch and Tom's feedback: http://archives.postgresql.org/pgsql-patches/2006-09/msg00260.php. The idea was to first implement just the WITH clause itself as a first patch, and then extend to include the recursive features with a secondary patch. In short, if people don't mind waiting for my free cycles to come along then I will continue to chip away at it; otherwise if it's considered an essential for 8.3 with an April deadline then I will happily hand over to Jonah. Kind regards, Mark. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 8.2beta1 crash possibly in libpq
On Sun, 2006-10-08 at 17:53 +0200, Magnus Hagander wrote: AFAICT the backtrace and server log is indicating that the crash is happening somewhere in libpq. If someone can help me figure out how to load the libpq symbols into MingW's gdb then I can get a better backtrace if required as I can reproduce this 100% of the time. For reference, the source for the application in question can be found at http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. If you figure out how to make gdb actually work on mingw, let us know - not many has ever managed to get it wokring, and I don't know of anybody who can make it work repeatedly. That said, libpq builds with Visual C++. Could you try building your pgsql2shp with Visual C++ as well, and then use the Visual C++ debugger (or windbg, really). They should give working backtraces. //Magnus Hi Magnus, Getting closer I think. I managed to compile a MSVC libpq but it agreed with the MingW backtrace in that it was jumping into the middle of nowhere :( I think I may be getting closer though: I've just done a comparison build with PostgreSQL 8.1 and noticed that there is an error message is being emitted regarding PGntuples (which is where the crash is occuring): PG 8.1: [EMAIL PROTECTED] ~/postgis/pg81/postgis-1.1.4/loader $ make gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o shpopen.o shpopen.c shpopen.c:176: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o dbfopen.o dbfopen.c dbfopen.c:206: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o getopt.o getopt.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o shp2pgsql.o shp2pgsql.c shp2pgsql.c: In function `utf8': shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible pointer type gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -IC:/msys/1.0/home/mca/pg81/REL-81~1.4/include -c pgsql2shp.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 -c -o PQunescapeBytea.o PQunescapeBytea.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg81/REL-81~1.4/lib/libpq.dll -o pgsql2shp.exe PG 8.2: [EMAIL PROTECTED] ~/postgis/pg82/postgis-1.1.4/loader $ make gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o shpopen.o shpopen.c shpopen.c:176: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o dbfopen.o dbfopen.c dbfopen.c:206: warning: 'rcsid' defined but not used gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o getopt.o getopt.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o shp2pgsql.o shp2pgsql.c shp2pgsql.c: In function `utf8': shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible pointer type gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -IC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/include -c pgsql2shp.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 -c -o PQunescapeBytea.o PQunescapeBytea.c gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import) I think the key part is this line: Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import). Could it be that the linker cannot find a reference to PQntuples and hence is jumping into random code? I have verified that PQntuples does exist within libpq.dll using the Microsoft Dependency Walker though. Kind regards, Mark. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 8.2beta1 crash possibly in libpq
Hi Magnus, I finally got to the bottom of this - it seems that the flags being passed to MingW's linker were incorrect, but instead of erroring out it decided to create a corrupt executable. Here is the command line that was being used to link the pgsql2shp.exe executable, along with the associated auto-import warning: gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import) Note that libpq.dll is referenced directly with -l which I believe should be an invalid syntax. This produces a corrupt executable that crashes whenever PQntuples is accessed. On the other hand, a correct executable can be realised by linking like this: gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o getopt.o PQunescapeBytea.o pgsql2shp.o -liconv -LC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib -lpq -o pgsql2shp.exe Note there is no auto-import warning, and the use of -L and -l is how I would expect. In actual fact, the incorrect link line was being produced by an error in the configure.in script, so this won't be a scenario that most people will experience. The executables linked using the second method now work properly without crashing during regression. The big mystery is that the command line used to link the executables has been like that for several versions now, and I have absolutely no idea why it only triggered this failure when being linked against 8.2beta1 when it works perfectly on 8.1 and 8.0, and also why only PQntuples was affected. Kind regards, Mark. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] 8.2beta1 crash possibly in libpq
Hi everyone, I'm in the process of generating the Windows installer for the latest PostGIS 1.1.4 release and I'm getting a regression failure in one of libpq applications - the application in question is generating a segfault. Testing so far shows that the regression tests pass without segfaulting in the following scenarios: PostgreSQL 8.2beta1 / PostGIS 1.1.4 / Linux PostgreSQL 8.1 / PostGIS 1.1.4 / Win32 So it appears it is something to do with 8.2beta1 and Win32. I've compiled the application with debugging symbols enabled and get the following backtrace from gdb in MingW: (gdb) set args -f /tmp/pgis_reg_4060/dumper postgis_reg loadedshp (gdb) run Starting program: C:\msys\1.0\home\mca\postgis\pg82\postgis-1.1.4 \regress/../loader/pgsql2shp.exe -f /tmp/pgis_reg_4060/dumper postgis_reg loadedshp Initializing... Program received signal SIGSEGV, Segmentation fault. 0x63512c1c in ?? () (gdb) bt #0 0x63512c1c in ?? () #1 0x0040c69c in _fu8__PQntuples () at pgsql2shp.c:2502 #2 0x00408481 in main (ARGC=5, ARGV=0x3d2750) at pgsql2shp.c:243 (gdb) I also turned on the logging in the server and get the following in the server log: 2006-10-08 12:01:15 LOG: statement: BEGIN; 2006-10-08 12:01:15 LOG: statement: CREATE TABLE loadedshp (gid serial PRIMARY KEY); 2006-10-08 12:01:15 NOTICE: CREATE TABLE will create implicit sequence loadedshp_gid_seq for serial column loadedshp.gid 2006-10-08 12:01:15 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index loadedshp_pkey for table loadedshp 2006-10-08 12:01:15 LOG: statement: SELECT AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2); 2006-10-08 12:01:17 LOG: statement: INSERT INTO loadedshp (the_geom) VALUES ('010100F03F'); 2006-10-08 12:01:18 LOG: statement: INSERT INTO loadedshp (the_geom) VALUES ('0101002240F0BF'); 2006-10-08 12:01:18 LOG: statement: INSERT INTO loadedshp (the_geom) VALUES ('0101002240F0BF'); 2006-10-08 12:01:18 LOG: statement: END; 2006-10-08 12:01:21 LOG: statement: select asewkt(the_geom) from loadedshp; 2006-10-08 12:01:36 LOG: statement: DROP table loadedshp 2006-10-08 12:01:39 LOG: statement: BEGIN; 2006-10-08 12:01:39 LOG: statement: CREATE TABLE loadedshp (gid serial PRIMARY KEY); 2006-10-08 12:01:39 NOTICE: CREATE TABLE will create implicit sequence loadedshp_gid_seq for serial column loadedshp.gid 2006-10-08 12:01:39 NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index loadedshp_pkey for table loadedshp 2006-10-08 12:01:39 LOG: statement: SELECT AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2); 2006-10-08 12:01:41 LOG: statement: COPY loadedshp (the_geom) FROM stdin; 2006-10-08 12:01:41 LOG: statement: END; 2006-10-08 12:01:43 LOG: statement: select asewkt(the_geom) from loadedshp; 2006-10-08 12:02:34 LOG: statement: SELECT postgis_version() 2006-10-08 12:02:34 LOG: statement: SELECT a.attname, a.atttypid, a.attlen, a.atttypmod FROM pg_attribute a, pg_class c WHERE a.attrelid = c.oid and a.attnum 0 AND a.atttypid != 0 AND c.relname = 'loadedshp' 2006-10-08 12:02:48 LOG: could not receive data from client: No connection could be made because the target machine actively refused it. 2006-10-08 12:02:48 LOG: unexpected EOF on client connection AFAICT the backtrace and server log is indicating that the crash is happening somewhere in libpq. If someone can help me figure out how to load the libpq symbols into MingW's gdb then I can get a better backtrace if required as I can reproduce this 100% of the time. For reference, the source for the application in question can be found at http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c. Many thanks, Mark. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PATCHES] WIP: Hierarchical Queries - stage 1
Hi Tom, Thanks for your initial thoughts on this. On Wed, 2006-09-20 at 20:13 -0400, Tom Lane wrote: (cut) You really can't get away with having the identical representation for CTEs and ordinary sub-selects in the range table. For instance, it looks like your patch will think that select ... from (select ...) as x, x, ... is legal when it certainly is not. I think you need either a new RTEKind or an additional flag in the RTE to show that it's a CTE rather than a plain subselect. I'm not entirely sure that you even want the CTEs in the rangetable at all --- that still needs some thought. For semantic reasons, I can see why you are questioning whether or not the CTE should be contained within the rangetable - there is an implicit hint that RTEs reflect entries within the FROM clause, but then I also see that the rewriter adds RTEs when substituting view definitions into queries. The comments in parsenodes.h also suggest that an RTE is a namespace/data source reference for a named entity within the query. The main problem I can see with keeping the CTEs outside the rangetable is that according to the source, jointree nodes must currently have RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of maintaining the CTEs separately would involve something along the lines of keeping a separate CTETable and creating some form of CTETBLREF node that could be referenced within the jointree. While arguably it may be semantically neater, it appears to involve quite a bit of extra work... could you explain in more detail as to why you feel that CTEs should remain outside the rangetable? This comes back to the question of whether the CTE per se should be an RTE at all. Maybe only the reference to it should be an RTE. The behavior when seeing a plain RangeVar in FROM would be to first search the side list of valid CTEs, and only on failure go looking for a real table. This is effectively what the patch does, albeit not particularly elegantly. I'll spend some time on making those changes a bit more refined. Kind regards, Mark. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?
On Mon, 2006-09-18 at 17:46 +0200, Matteo Beccati wrote: Tom Lane ha scritto: Matteo Beccati [EMAIL PROTECTED] writes: I cannot see anything bad by using something like that: if (histogram is large/representative enough) Well, the question is exactly what is large enough? I feel a bit uncomfortable about applying the idea to a histogram with only 10 entries (especially if we ignore two of 'em). With 100 or more, it sounds all right. What's the breakpoint? Yes, I think 100-200 could be a good breakpoint. I don't actually know what is the current usage of SET STATISTICS, I usually set it to 1000 for columns which need more precise selectivity. The breakpoint could be set even higher (500?) so there is space to increase statistics without enabling the histogram check, but I don't feel very comfortable though suggesting this kind of possibly undocumented side effect... Hi everyone, You may be interested to have a look at the statistics collector for the geometry type within PostGIS. In order to prevent very large or very small geometries from ruining the statistics histogram and generating incorrect query plans, we make the assumption that the column distribution is likely to be close to normal, and then remove any ANALYZE-collected geometries from the set that lie outside +/- 3.25 standard deviations from the mean before creating the final histogram (removes just under 1% of the data from each end of an assumed normal distribution). This works well and AFAIK we've only ever had one reported case of an incorrect query plan being generated using this method. Kind regards, Mark. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Hierarchical Queries--Status
On Sat, 2006-08-26 at 22:46 -0400, Jonah H. Harris wrote: On 8/26/06, Alvaro Herrera [EMAIL PROTECTED] wrote: Actually I was thinking in the design rather than the code ... Doh! We hadn't posted the design just yet. Let me write him and see where he's at and we'll throw something together for the list. [Note to Jonah: I've tried sending a similar version of this email to you a couple of times, but I'm not sure that it's getting through, hence the post to -hackers in the hope you may be able to pick it up there.] Hi everyone, I've had a chance to sit down for a day or so to see how to approach adding hierarchical queries to PostgreSQL, so I thought I'd post my initial thoughts on how to proceed. My aim is to refine the list below based upon feedback from hackers until it gets to the point where I can start work on it. Here's what I've got so far: 1) Add detection of the WITH clause to the parser. 2) Create a new type of RangeTblEntry to represent each common table expression specified within the WITH clause where the subquery field points to the nodes representing the common table expression. 3) Add planner support so that WITH clauses are mapped to a new type of node that utilises two tuplestores - an output tuplestore and a working tuplestore. The output tuple store will in effect be the contents of the table expression while the working tuplestore holds the results of the last iteration if recursive. Also implement some kind of WithState node which keeps track of the recursion state. Having spent some more time today looking at 1) and also at the SQL 2003 spec, it would seem that other databases offer the WITH clause within subqueries and also as part of a view. I'd be interested to hear feedback from other developers as to whether it would be possible to achieve this level of support within PostgreSQL. Many thanks, Mark. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal for debugging of server-side stored procedures
-Original Message- From: Thomas Hallgren [mailto:[EMAIL PROTECTED] Sent: 11 June 2006 10:07 To: Mark Cave-Ayland Cc: pgsql-hackers@postgresql.org Subject: Re: Proposal for debugging of server-side stored procedures Hi Tom, (cut) Obviously I'm not a Perl nor Python hacker. I just find it hard to believe that languages that has matured over a decade doesn't have a remote debug option that can be used. Sockets, shared-memory, pipes, or whatever. You will be able to attach to it somehow given it's been launched with the correct flags. I think you're wrong in thinking that client/server based debugging is in any way unusual. Googling for perl+debug+remote tells me that client/server based Perl debugging is common. There are a number of solutions. The same is true for Python. It's not unusual, it's just that the standard libraries included with Perl/Python contain practically no support for remote debugging - while they both provide interfaces for you to write you own replacement, there doesn't seem to be a standard for remote debugging other than Xdebug. (cut) In my experience you have two use-cases. 1. You debug during development and have either have your own database instance to play around with or a shared sandbox database where the security is very low. 2. You debug a running instance on a live server and the sysadmin is well aware what you're doing. You will be given required privileges as needed. I would argue that the times when security becomes an issue when debugging are extremely rare and not worth spending lots of time and effort on. It is enough to prevent anyone but a super-user (or even a sysadmin) to start a remote debug session. True, although there are times for example in our web applications where we need to debug as a non super-user since the web user accessing the database obviously doesn't have super-user rights. In the case of DBGP this could be a case of allowing only the super-user to set the remote host/port while allowing any user to initiate a debug session. At least in the case the administrator would be aware of the implications of enabling debugging in this way. (cut) A promising option at the moment would be to implement the DBGP protocol for Perl/Python/Tcl as suggested by Lukas, mainly because it appears ActiveState have already written the modules to do this (see http://aspn.activestate.com/ASPN/Downloads/Komodo/RemoteDebugging). There you go! Perl, PHP, Python, and Tcl all taken care of. IDE and all! I've been having a look at the licensing and it appears that the server side libraries are licensed under the Artistic License (http://www.opensource.org/licenses/artistic-license.php). Looking at the wording of the license I can't see that there would be much of a problem including these with PostgreSQL to give out of the box DBGP support, but with the option of allowing the user to override the defaults per language in postgresql.conf - this would allow, say, for Andrew to run ptkdb as an alternative while normal users could just go with the defaults. Can anyone see any problems with this? If not, I'd be happy to approach the ActiveState/Xdebug people to see if this is feasible. The only remaining angle would be to implement the Xdebug protocol for plpgsql in C which would probably be the most time-consuming aspect of the project :) (cut) I'd use the Komodo IDE and implement the ability to start the PL using a GUC setting per my original suggestion (with super-user requirement). Trivial solution, minimum effort, and very useful. KISS principle. It would be great if we could agree on a GUC flag (or flags) that would control debugging for all PL's. At present, all PL implementors must use their own (module specific) flags. I'm not quite sure what you had in mind from your previous email - could you elaborate more on this? Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal for debugging of server-side stored procedures
-Original Message- From: Thomas Hallgren [mailto:[EMAIL PROTECTED] Sent: 09 June 2006 16:25 To: Mark Cave-Ayland Cc: pgsql-hackers@postgresql.org Subject: Re: Proposal for debugging of server-side stored procedures Some thoughts from another Tom... Hi Tom, Thanks for the feedback :) All PL's are launched someway or another . It would probably be very simple to add the '-d' flag to the PL/Perl launcher and the '-m' flag to the PL/Python launcher and control it with module specific GUC settings. PL/Java already does this. The SQL to initialize debugging looks like this: SET pljava.vmoptions TO '- agentlib:jdwp=transport=dt_shmem,server=y,suspend=y'; This tells the VM to load in-process debugging libraries and specifies the kind of connection to be made. As soon as the first java function is called, the process is suspended and waits for a client debugger to attach. The amount of work needed to create similar solutions for perl, python, tcl, etc. is probably limited and fairly trivial. I think that Java is quite unusual in that the design of JPDA is inherently client/server based to the point where they have defined the platform to allow you interact with the JVM via a socket. Unfortunately the same can't be said for Perl/Python - as you suggest passing the parameters into the interpreter is a trivial exercise but because the debugging classes in both languages are console interactive, the only thing you could do is redirect the console output to a socket (see http://search.cpan.org/dist/perl/lib/perl5db.pl and the RemotePort option in the case of Perl). What I would like to see is some form of IDE that our developers can use, probably under Windows, that would enable them to step through and debug stored procedures on a remote server on another network. Using simple console redirection would involve parsing text output which strikes as being something that would break easily with new releases. And then of course, there is the problem of security whereby anyone could connect to the socket. For example, imagine a group of developers all debugging different functions simultaneously; if one of them connected to the wrong console socket then it could be confusing as the developer wanders why their code never stops at a breakpoint. And it would force you to write your own proprietary debugger backend for each language. That's a major thing to undertake. Have you considered the maintenance aspect of it? Not something that would make the author of a PL module scream with joy. It might be wise to also consider what debugger preferences a skilled perl/python/whatever language developer might have. A home brewed debugger in the form of a PostgreSQL add-on might not be a natural choice. Well my original thinking was that you could factor most of the code into PostgreSQL itself; the only thing that PL developers would have to is provide an API for things like step, set breakpoint, read variable, and eval. A promising option at the moment would be to implement the DBGP protocol for Perl/Python/Tcl as suggested by Lukas, mainly because it appears ActiveState have already written the modules to do this (see http://aspn.activestate.com/ASPN/Downloads/Komodo/RemoteDebugging). The only issue I can see with this is again related to security in that the debugger would not respect the ACLs within PostgreSQL which could potentially allow a user to break inside a function that wasn't his/her own. Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal for debugging of server-side stored procedures
-Original Message- From: Andrew Dunstan [mailto:[EMAIL PROTECTED] Sent: 09 June 2006 17:01 To: Mark Cave-Ayland Cc: 'Tom Lane'; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Proposal for debugging of server-side stored procedures (cut) Debugging embedded perl has some significant challenges. You might find it interesting to see what can be done in the most famous embedded situation: mod_perl. See http://perl.apache.org/docs/1.0/guide/debug.html using ptkdb might be nice cheers andrew Hi Andrew, Thanks for the link. This brings up another issue - what if someone wishes to debug locally using their favourite tool rather than a PostgreSQL tool? Should this be allowed even if it may circumvent PostgreSQL's user permissions on functions within the database? Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Proposal for debugging of server-side stored procedures
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 29 May 2006 18:05 To: Mark Cave-Ayland Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Proposal for debugging of server-side stored procedures (cut) As far as the debug protocol details go, it'd be worth looking at the GDB host protocol to see if it's usable; even if not directly usable (it's probably too low-level) it would give ideas about the functions you need to provide. It'd also be smart to understand how debugging is commonly done in Perl, Python, etc, with an eye to being able to piggyback on existing tools instead of inventing our own. regards, tom lane Hi Tom, Thanks for your input on this. I've been spending some time researching debuggers over the past week and AFAICT we'd be struggling to use the native debuggers that come with Perl/Python et al. The main reason for this is that the debuggers with Perl/Python are actually interactive environments, for example debugging in Perl is initiated with perl -d somefile.pl which then becomes a special interactive interpreter session. The same is also true with Python which is launched in a similar way, python -m pdb somefile.py. However, both Perl and Python offer the ability to hook into the interpreter to detect events. For example, Python offers a C API here [1] that allows you to supply a callback when particular events occur; this would allow us to execute a callback after the interpreter executes each line. Perl seems a little more messy in that I can't find a documented C API to hook into the interpreter, but it looks as if it may be possible to cook something up with writing a new DB package [2] which uses XS call a C callback. The other issue is that unlike Python, the debug capability must be specified when creating the instance of interpreter rather than being able to enable/disable debugging on the fly so it may mean that two instances of the perl interpreter need to be maintained in memory - a standard instance and a debugging instance. Plpgsql isn't really a concern as we can simply code up whichever model we eventually decide to use. The only bad news I can see is that it appears Tcl may need to have the source patched in order to add debug hooks into the interpreter [3]. Also, I'm wondering how to design the mechanism to be extendable in the future beyond debugging server-side functions, such as hooking into more of the executor mechanisms. For example, I could see a use case for allowing a profiler to use the debug API to attach to the postmaster to receive notifications whenever a new SQL query starts and ends; this would allow someone to write a rather neat app that could rank the most popular queries in terms of processing time really easily, but then maybe this could considered treading on the toes of the existing stats process... Thoughts? Mark. [1] http://docs.python.org/api/profiling.html [2] http://search.cpan.org/~nwclark/perl-5.8.8/pod/perldebguts.pod [3] http://www.xdobry.de/atkdebugger/index.html WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Proposal for debugging of server-side stored procedures
Hi everyone, Having browsed the TODO list, one of the items that I would be interested on working on is a debugger for stored procedures. Having searched on this topic in the archives, I'm still short of some answers that would allow me to come up with a complete proposal that I can use to start coding. The most important question to answer in my mind is how should the debugger communicate with the server? I can see 3 ways in which this could happen: 1) Use the existing FE/BE protocol to allow the user to control the debugging session using stored procedures/pseudo-tables, e.g. SELECT pg_debug_enable('myfunction'); INSERT INTO pg_debug_breakpoints (function, line) VALUES ('myfunction', 2); SELECT pg_debug_step('myfunction'); 2) Spawn a separate debugger process listening on another port to allow OOB communication with the server. This would involve the design and implementation of a debug FE/BE protocol, unless there is a standard that already exists. 3) Extend the existing FE/BE protocol to allow transmission of an OOB debug channel. My current thoughts are leaning towards 2 or 3, with the advantage of approach 3 being that once you can connect to the database using a client, the debugging functionality automatically becomes available without having to worry about additional security checks on the debugger port (like another version of pg_hba.conf just for the debugger) and firewalls blocking debugger connections. Thoughts? Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] WITH/WITH RECURSIVE implementation discussion
-Original Message- From: Jonah H. Harris [mailto:[EMAIL PROTECTED] Sent: 01 May 2006 14:56 To: Mark Cave-Ayland Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] WITH/WITH RECURSIVE implementation discussion On 5/1/06, Mark Cave-Ayland [EMAIL PROTECTED] wrote: The latest discussion I found related to this appears to be here: http://archives.postgresql.org/pgsql-hackers/2005-11/msg00564.php which indicates that Jonah is hoping to work on this for 8.2, but I don't see this item as being worked on in the TODO list - does that mean that development on this has halted? No, in between all my other work here at EnterpriseDB, I'm still working on it. I hope to get it submitted to -patches for review in the next two weeks. Hi Jonah, That's great news. While I wouldn't say that I have intimate knowledge of the PostgreSQL internals, I'd be happy to try and help you with this if time is proving to be a problem. Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] WITH/WITH RECURSIVE implementation discussion
Hi folks, Looking ahead at some of the projects I'll be working on in the future, I'm seeing that having an implementation of WITH/WITH RECURSIVE for working with tree structures is going to be a very useful feature and so would like to re-start the discussions on this to see whether this could be achieved for PG 8.2. The latest discussion I found related to this appears to be here: http://archives.postgresql.org/pgsql-hackers/2005-11/msg00564.php which indicates that Jonah is hoping to work on this for 8.2, but I don't see this item as being worked on in the TODO list - does that mean that development on this has halted? Kind regards, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Any advice about function caching?
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 07 November 2005 23:06 To: Mark Cave-Ayland (External) Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Any advice about function caching? (cut) If you want per-query state, keep it in a data structure linked from the fcinfo-flinfo-fn_extra field (physically, store it in fn_mcxt, or fcinfo-flinfo-create a subcontext of that if you wish). If you need to get control at query shutdown to free non-palloc'd resources, RegisterExprContextCallback may help. (I think such callbacks are only called during *successful* query shutdown, though, so if you have external library state you need to clean up anyway, you'll need some other approach to keeping track of it ... maybe a permanent data structure instead of a per-query one.) src/backend/utils/fmgr/funcapi.c and src/backend/executor/functions.c might be useful examples. Hi Tom, Thanks for the advice about state - this is definitely pointing me towards looking at the existing code for aggregates and SRFs. Incidentally I've found that attaching my cleanup memory context to PortalContext with some elogs() shows that it appears to be called correctly just before the portal is destroyed - so whatever I finally come up with is likely to be a combination of the two methods. I will dig further into the function code and see how I manage. Many thanks, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Any advice about function caching?
Hi everyone, I'm currently in the middle of trying to implement a cache for a PostGIS server-side function that uses an external library with an expensive startup cost, and was hoping to find some advice on the best way to implement it. The function currently takes 2 parameters - a customised C geometry datatype and a destination SRID which is the primary key to the library parameters in the spatial_ref_sys table. Currently for each invocation of the function it is necessary to lookup the parameters for both the source and destinations SRIDs in spatial_ref_sys, create a handle for each SRID in the external library, perform the calculation, and then return the result. Obviously this process is quite expensive when calling the function on large tables of data. I've currently implemented a basic cache for both the spatial_ref_sys lookups and the external library functions which gives a dramatic performance improvement, dropping my test query from over 2 minutes to 6s(!), but the part I'm experiencing difficulty with is working out when the start and end of a query occurs. My current code works by using MemoryContextCreate() to create a child context to MessageContext and using the Init()/Delete() functions to initialise and destroy a cache in the local backend. However, this doesn't really work particularly well when using cursors and prepared queries since it appears what I should be doing is using a cache per portal rather than a cache per backend. The other complication is that the external library handles cannot be free()d directly but instead a customised free function must be called. So my questions are: 1) What is the best way of keeping track of whether the cache has been initalised for a given portal? Looking in contrib/dblink.c it looks as if the best way to go would be to create a hash table per backend based upon the portal name in ActivePortal, attach the child context to PortalContext, and use the Delete() function to call the customised free function and remove the hash table entry. However, which memory context would be the best one to use in order to store the hash table entries? 2) Is there a better way of doing this? 3) Would anyone consider a patch to the source tree to make implementing something like this easier in future? Many thanks, Mark. WebBased Ltd 17 Research Way Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 http://www.webbased.co.uk http://www.infomapper.com http://www.swtc.co.uk This email and any attachments are confidential to the intended recipient and may also be privileged. If you are not the intended recipient please delete it from your system and notify the sender. You should not copy it or use it for any purpose nor disclose or distribute its contents to any other person. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS]
Date: Thu, 04 Aug 2005 11:12:58 -0400 From: Tom Lane [EMAIL PROTECTED] To: Matteo Beccati [EMAIL PROTECTED] Cc: pgsql-hackers@postgresql.org Subject: Re: Enhanced containment selectivity function Message-ID: [EMAIL PROTECTED] Matteo Beccati [EMAIL PROTECTED] writes: This also made me think: is there a reason why geometric selectivity functions return constant values rather than checking statistics for a better result? Because no one's ever bothered to work on them. You should talk to the PostGIS guys, however, because they *have* been working on real statistics and real estimation functions for geometric types. It'd be nice to get some of that work back-ported into the core database. http://www.postgis.org/ regards, tom lane Hi Tom/Matteo, As a PostGIS guy I can say that the latest geometric estimation functions for the PostGIS 1.0 series are very good indeed - Sandro has done an excellent job in working on a 2D histogram sample and it is surprising how accurate the estimates are even with very low statistics targets. Also, since Tom applied the ANALYZE patch for the 8.0 series, I've been working on a new page for the PostgreSQL documentation explaining how the ANALYZE process works, and gives an example using the complex type as to how you can collect your own statistics during ANALYZE and make use of them from your own selectivity functions. It's about three quarters complete, and still in plain text format, but I hope to submit it as a documentation patch for 8.1. My hope is that this will then encourage someone to add selectivity functions to ltree and tsearch2 (hint hint) ;) Kind regards, Mark. WebBased Ltd 17 Research Way Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Implementing SQL/PSM for PG 8.2 - debugger
Hi guys, I lean with you and Tom. While running it over the same libpq protocol would be helpful in some ways, it would have a lot of drawbacks and would really change the function of libpq. I think a separate debugging protocol is in order. Just putting on my network hat for a moment... Would an approach be to come up with a generic encapsulation protocol, similar in principle to PPP, in which we could run any protocols we wanted? If we had something like a PGSQL Encapsulation Protocol (PGEP) used to transfer all data between a PostgreSQL client/server, then we can use this to tunnel libpq requests as they are at the moment through to the other side. However, it would also mean that people could add any other protocols as they see fit for debugging, statistics and all sorts of things that people have yet to think of. Obviously this would require a client/server interface change so it's not to be taken lightly, but I thought I'd mention it since people have mentioned the possibility of changes to the FE/BE protocol. Kind regards, Mark. WebBased Ltd 17 Research Way Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Fixing r-tree semantics
Hi Tom, What needs more discussion is that it seems to me to make sense to extend the standard opclasses to handle the four Y-direction operators comparable to the X-direction operators that are already there, that is above, below, overabove, and overbelow. As part of PostGIS (http://postgis.refractions.net), I submitted a patch a while back to add additional Y-direction operators to the code which is a slightly modified version of rtree_gist (and yes, the PostGIS implementation will suffer from the same issues you've found with the existing R-tree implementations). The operators I went for were as follows: A | B - true if A's bounding box overlaps or is below B's bounding box A | B - true if B's bounding box overlaps or is above B's bounding box A | B - true if A's bounding box is strictly below B's bounding box A | B - true if A's bounding box is strictly above B's bounding box Since the rtree_gist implementation and operators were 2D, my thoughts were to use another op-class only if the indexing were upgraded to 3D. So with this in mind, I created the following new GiST strategies: #define RTOverBelowStrategyNumber 9 #define RTBelowStrategyNumber 10 #define RTAboveStrategyNumber 11 #define RTOverAboveStrategyNumber 12 This is basically what you are suggesting but with a | instead of a ^ in the operator name (my original choice was to try and use } to indicate the positive sense of the Y-axis but this was not accepted as a valid operator character which is why I changed to |). It would be harder for us to change these operators since they already exist, but then again it would be useful from a maintenance point of view to keep the strategy numbers and operators the same across both implementations. Of course strategy numbers are just used internally so these aren't such a big issue - it's more the choice of operators that would be useful to agree on. Kind regards, Mark. WebBased Ltd 17 Research Way Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Fixing r-tree semantics
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 24 June 2005 14:27 To: Mark Cave-Ayland (External) Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; oleg@sai.msu.su; pgsql-hackers@postgresql.org; 'PostGIS Development Discussion' Subject: Re: Fixing r-tree semantics (cut) Well, I was proposing more or less that but with ^ because of the precedent of the two existing box_above/box_below operator names. However, I'm quite happy to adopt your names, since that's probably a more widely used precedent. Sold, unless there are objections. (BTW, it does look a bit odd that the | moves around in your names. But I don't dislike it enough to not follow the precedent.) The thinking behind it was that the | represents the x-axis if you tilt you head right 90 degrees. In effect, it would allow you to 'read' the operator without having to go and look up what it does. It would be harder for us to change these operators since they already exist, but then again it would be useful from a maintenance point of view to keep the strategy numbers and operators the same across both implementations. Agreed, I'll use your strategy number assignments too. Alright no problems. Many thanks, Mark. WebBased Ltd 17 Research Way Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 04 June 2005 16:46 To: Mark Cave-Ayland (External) Cc: pgsql-hackers@postgresql.org Subject: Re: Quick-and-dirty compression for WAL backup blocks (cut) I've completed a test run for this (it's essentially MySQL's sql-bench done immediately after initdb). What I get is: CVS tip of 6/1: ending WAL offset = 0/A364A780 = 2741282688 bytes written CVS tip of 6/2: ending WAL offset = 0/8BB091DC = 2343604700 bytes written or about a 15% savings. This is with a checkpoint_segments setting of 30. One can presume that the savings would be larger at smaller checkpoint intervals and smaller at larger intervals, but I didn't try more than one set of test conditions. I'd say that's an improvement worth having, especially considering that it requires no net expenditure of CPU time. But the table is certainly still open to discuss more complicated approaches. Hi Tom, Thanks for the numbers. I've just been across to the OSDL STP website and it appears that the automatic nightly PostgreSQL CVS builds set up by Mark have been broken since the middle of April :( A brief look shows that compilation of the PostgreSQL sources is failing with lots of errors and warnings. I don't think I can justify the time at the moment to go and look at this myself, however I thought I'd post to -hackers as a heads up in case anyone else could pick this up. Kind regards, Mark. WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Cost of XLogInsert CRC calculations
-Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: 31 May 2005 17:27 To: Greg Stark Cc: Mark Cave-Ayland (External); 'Manfred Koizar'; 'Bruce Momjian'; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations (cut) The odds of an actual failure from case #2 are fortunately not high, since a backup block will necessarily span across at least one WAL page boundary and so we should be able to detect stale data by noting that the next page's header address is wrong. (If it's not wrong, then at least the first sector of the next page is up-to-date, so if there is any tearing the CRC should tell us.) Therefore I don't feel any need to try to work out a back-patchable solution for #2. But I do think we ought to change the WAL format going forward to compute just one CRC across a WAL record and all attached backup blocks. There was talk of allowing compression of backup blocks, and if we do that we could no longer feel any certainty that a page crossing would occur. I must admit I didn't realise that an XLog record consisted of a number of backup blocks which were also separately CRCd. I've been through the source, and while the XLog code is reasonably well commented, I couldn't find a README in the transam/ directory that explained the thinking behind the current implementation - is this something that was discussed on the mailing lists way back in the mists of time? I'm still a little nervous about dropping down to CRC32 from CRC64 and so was just wondering what the net saving would be using one CRC64 across the whole WAL record? For example, if an insert or an update uses 3 backup blocks then this one change alone would immediately reduce the CPU usage to one third of its original value? (something tells me that this is probably not the case as I imagine you would have picked this up a while back). In my view, having a longer CRC is like buying a holiday with insurance - you pay the extra cost knowing that should anything happen then you have something to fall back on. However, the hard part here is determining a reasonable balance betweem the cost and the risk. Kind regards, Mark. WebBased Ltd South West Technology Centre Tamar Science Park Plymouth PL6 8BT T: +44 (0)1752 797131 F: +44 (0)1752 791023 W: http://www.webbased.co.uk ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq