from:"Mark Cave\-Ayland"

Re: [HACKERS] More flexible LDAP auth search filters?

2017-09-08 Thread Mark Cave-Ayland

On 08/09/17 16:33, Peter Eisentraut wrote:

> A couple of comments on this patch.  I have attached a "fixup" patch on
> top of your v4 that should address them.
> 
> - I think the bracketing of the LDAP URL synopsis is wrong.
> 
> - I have dropped the sentence that LDAP URL extensions are not
> supported.  That sentence was written mainly to point out that filters
> are not supported, which they are now.  There is nothing beyond filters
> and extensions left in LDAP URLs, AFAIK.
> 
> - We have previously used the ldapsearchattribute a bit strangely.  We
> use it as both the search filter and as the attribute to return from the
> search, but we don't actually care about the returned attribute (we only
> use the DN (which is not a real attribute)).  That hasn't been a real
> problem, but now if you specify a filter, as the code stands it will
> always request the "uid" attribute, which won't work if there is no such
> attribute.  I have found that there is an official way to request "no
> attribute", so I have fixed the code to do that.
> 
> - I was wondering whether we need a way to escape the % (%%?) but it
> doesn't seem worth bothering.
> 
> I have been looking around the Internet how this functionality compares
> to other LDAP authentication systems.
> 
> I didn't see the origin of the %u idea in this thread, but perhaps it
> came from Dovecot.  But there it's a general configuration file syntax,
> nothing specific to LDAP.  In nginx you use %(username), which again is
> general configuration file syntax.  I'm OK with using %u.
> 
> The original LDAP URL design was adapted from Apache
> (https://httpd.apache.org/docs/2.4/mod/mod_authnz_ldap.html#authldapurl).
>  They combine the attribute filter and the general filter if both are
> specified, but I think they got that a bit wrong.  The attribute field
> in the URL is actually not supposed to be a filter but a return field,
> which is also the confusion mentioned above.
> 
> The PAM module (https://linux.die.net/man/5/pam_ldap) also has a way to
> specify a search attribute and a general filter and combines them.
> 
> Neither of these allows substituting the user name into the search filter.

Great work! Having installed quite a few LDAP-based authentication
setups in the past, I can promise you that pam_ldap is the last tool I
would consider for the job so please don't hold to this as being the
gold standard(!).

My weapon of choice for LDAP deployments on POSIX-based systems is
Arthur De Jong's nss-pam-ldapd (https://arthurdejong.org/nss-pam-ldapd)
which is far more flexible than pam_ldap and fixes a large number of
bugs, including the tendency for pam_ldap to hang infinitely if it can't
contact its LDAP server.

Take a look at nss-pam-ldapd's man page for nslcd.conf and in particular
pam_authz_search - this is exactly the type of filters I would end up
deploying onto servers. This happens a lot in large organisations
whereby getting group memberships updated in the main directory can take
days/weeks whereas someone with root access to the server itself can
hard-code an authentication list of users and/or groups in an LDAP
filter in just a few minutes.


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-08-04 Thread Mark Cave-Ayland

On 01/08/17 23:17, Thomas Munro wrote:

> On Wed, Aug 2, 2017 at 5:36 AM, Peter Eisentraut
>  wrote:
>> On 7/16/17 19:09, Thomas Munro wrote:
>>> On Mon, Jul 17, 2017 at 10:26 AM, Thomas Munro
>>>  wrote:
 ldap-search-filters-v2.patch
>>>
>>> Gah, it would help if I could spell "occurrences" correctly.  Fixed in
>>> the attached.
>>
>> Please also add the corresponding support for specifying search filters
>> in LDAP URLs.  See RFC 4516 for the format and
>> https://linux.die.net/man/3/ldap_url_parse for the API.  You might just
>> need to grab LDAPURLDesc.lud_filter and use it.
> 
> Good idea.  Yes, it seems to be that simple.  Here's a version like
> that.  Here's an example of how it looks in pg_hba.conf:
> 
> host   all all  127.0.0.1/32ldap
> ldapurl="ldap://localhost/ou=people1,dc=my-domain,dc=com??sub?(cn=%25u)"
> 
> Maybe we could choose a better token than %u for user name, since it
> has to be escaped when included in a URL like that, but on the other
> hand there seems to be wide precedent for %u in other software.

Yeah, mostly I only ever see ldapurls used programatically, i.e. the
configuration allows you to set the various fields separately and then
the software generates the URL with the correct encoding itself. But if
it's documented that's not a reason to reject the patch as I definitely
see it as an improvement.

As I mentioned previously in the thread, the main barrier preventing
people from using LDAP is that the role cannot be generated from other
attributes in the directory. In a lot of real-life cases I see, that
would be enough to discount PostgreSQL's LDAP authentication completely.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-18 Thread Mark Cave-Ayland

On 17/07/17 18:08, Magnus Hagander wrote:

> On Mon, Jul 17, 2017 at 6:47 PM, Mark Cave-Ayland
> <mark.cave-ayl...@ilande.co.uk <mailto:mark.cave-ayl...@ilande.co.uk>>
> wrote: 
> Great to hear from you! It has definitely been a while...
> 
> Indeed. You should spend more time on these lists :P

Well I do get the emails, unfortunately it's trying to find the time to
read them all ;)

> > How would that actually work, though? Given the same user in ldap could
> > now potentially match multiple different users in PostgreSQL. Would you
> > then create two accounts for the user, one with the uid as name and one
> > with email as name? Wouldn't that actually cause more issues than it 
> solves?
> 
> Normally what happens for search+bind is that you execute the custom
> filter with substitutions in order to find the entry for your bind DN,
> but you also request the value of ldapsearchattribute (or equivalent) at
> the same time. Say for example you had an entry like this:
> 
> dn: uid=mha,dc=users,dc=hagander,dc=net
> uid: mha
> mail: mag...@hagander.net <mailto:mag...@hagander.net>
> 
> Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
> "uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
> email address.
> 
> If the bind is successful then the current user identity should be set
> to the value of the ldapsearchattribute retrieved from the bind DN
> entry, which with the default of "uid" would be "mha". Hence you end up
> with the same user role "mha" regardless of whether a uid or email
> address was entered.
> 
> 
> Right. This is the part that doesn't work for PostgreSQL. Because we
> have already specified the username -- it goes in the startup packet in
> order to pick the correct row from pg_hba.conf.

I don't think that's necessarily going to be an issue here because if
you're specifying a custom LDAP filter then there's a very good chance
that you're delegating access control to information held in the
directory anyway, e.g.

  (&(memberOf=cn=pgusers,dc=groups,dc=hagander,dc=net)(uid=%u))

  ((&(uid=%u)(|(uid=mha)(uid=mark)(uid=thomas)))

In the mail example above when you're using more than one attribute, I
think it's fair enough to simply say in the documentation you must set
user to "all" in pg_hba.conf since it is impossible to know which
attribute is being used to identify the user role until after
authentication.

I should add that personally I don't recommend such setups where the
user can log in using more than one identifier, but there are clients
out there who absolutely will insist on it (think internal vs. external
users) so if the LDAP support is being updated then it's worth exploring
to see if these cases can be supported.

> I guess in theory we could treat it like Kerberos or another one of the
> systems where we get the username from an external entity. But then
> you'd still have to specify the mapping again in pg_ident.conf, and it
> would be a pretty strong break of backwards compatibility.

(goes and glances at the code)

Is there no reason why you couldn't just overwrite port->user_name based
upon ldapsearchattribute at the end of CheckLDAPAuth?

And if this were only enabled when a custom filter were specified then
it shouldn't cause any backwards compatibility issues being a new feature?

> In terms of matching multiple users, all LDAP authentication routines
> I've seen will fail if more than one DN matches the search filter, so
> this nicely handles the cases where either the custom filter is
> incorrect or more than one entry is accidentally matched in the
> directory.
> 
> So do we, in the current implementation. But it's a lot less likely to
> happen in the current implementation, since we do a single equals lookup.

Great, that's absolutely fine :)


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-17 Thread Mark Cave-Ayland

On 17/07/17 13:09, Magnus Hagander wrote:

Hi Magnus,

Great to hear from you! It has definitely been a while...

> Generally you find that you will be given the option to set the
> attribute for the default search filter of the form
> "(attribute=username)" which defaults to uid for UNIX-type systems and
> sAMAccountName for AD. However there is always the ability to specify a
> custom filter where the user is substituted via e.g. %u to cover all the
> other use-cases.
> 
> Right, but that's something we do already today. It just defaults to
> uid, but it's easy to change. 

Yes, I think that bit is fine as long as the default can be overridden.
There's always a choice as to whether the defaults favour a POSIX-based
LDAP or AD environment but that happens with all installations so it's
not a big deal.

> As an example, I don't know if anyone would actually do this with
> PostgreSQL but I've been asked on multiple occasions to configure
> software so that users should be allowed to log in with either their
> email address or username which is easily done with a custom LDAP filter
> like "(|(mail=%u)(uid=%u))".
> 
> 
> How would that actually work, though? Given the same user in ldap could
> now potentially match multiple different users in PostgreSQL. Would you
> then create two accounts for the user, one with the uid as name and one
> with email as name? Wouldn't that actually cause more issues than it solves?

Normally what happens for search+bind is that you execute the custom
filter with substitutions in order to find the entry for your bind DN,
but you also request the value of ldapsearchattribute (or equivalent) at
the same time. Say for example you had an entry like this:

dn: uid=mha,dc=users,dc=hagander,dc=net
uid: mha
mail: mag...@hagander.net

Using the filter "(|(mail=%u)(uid=%u))" would locate the same bind DN
"uid=mha,dc=users,dc=hagander,dc=net" with either one of your uid or
email address.

If the bind is successful then the current user identity should be set
to the value of the ldapsearchattribute retrieved from the bind DN
entry, which with the default of "uid" would be "mha". Hence you end up
with the same user role "mha" regardless of whether a uid or email
address was entered.

In terms of matching multiple users, all LDAP authentication routines
I've seen will fail if more than one DN matches the search filter, so
this nicely handles the cases where either the custom filter is
incorrect or more than one entry is accidentally matched in the directory.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 17/07/17 00:14, Stephen Frost wrote:

>> If it helps, we normally recommend that clients use ldaps for both AD
>> and UNIX environments, although this can be trickier from an
>> administrative perspective in AD environments because it can require
>> changes to the Windows firewall and certificate installation.
> 
> LDAPS is better than straight LDAP, of course, but it still doesn't
> address the issue that the password is sent to the server, which both
> SCRAM and Kerberos do and is why AD environments use Kerberos for
> authentication, and why everything in an AD environment also should use
> Kerberos.
> 
> Using Kerberos should also avoid the need to hack the Windows firewall
> or deal with certificate installation.  In an AD environment, it's
> actually pretty straight-forward to add a PG server too.  Further, in my
> experience at least, there's been other changes recommended by Microsoft
> that prevent using LDAP for auth because it's insecure.

Oh sure - I'm not questioning that Kerberos is a far better choice in
pure AD environments, it's just that I spend the majority of my time in
mixed-mode environments where Windows is very much in the minority.

In my experience LDAP is often implemented badly; for example the
majority of software still uses simple binds (i.e. plain logins) rather
than using SASL binds which support a whole range of better
authentication methods (e.g. GSSAPI, and even DIGEST-MD5 has been
mandatory for v3 and is implemented on AD).

And yes, while better authentication mechanisms do exist, I find that
all too often most software packages claim LDAP support rather than
Kerberos, and even then it is often limited to LDAP simple binds without
ldaps support.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 16/07/17 23:26, Thomas Munro wrote:

> Thank you very much for this feedback and example, which I used in the
> documentation in the patch.  I see similar examples in the
> documentation for other things on the web.
> 
> I'll leave it up to Magnus and Stephen to duke it out over whether we
> want to encourage LDAP usage, extend documentation to warn about
> cleartext passwords with certain LDAP implementations or
> configurations, etc etc.  I'll add this patch to the commitfest and
> get some popcorn.

If it helps, we normally recommend that clients use ldaps for both AD
and UNIX environments, although this can be trickier from an
administrative perspective in AD environments because it can require
changes to the Windows firewall and certificate installation.

Whilst OpenLDAP will support ldap+starttls you can end up with some
clients with starttls either disabled or misconfigured sending plaintext
passwords over the wire regardless, so it's generally easiest to
firewall ldap port 389 at the edge of the trusted VLAN so that only
ldaps port 636 connections make it out onto the untrusted network
hosting the local AD/OpenLDAP server.

ATB,

Mark.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] More flexible LDAP auth search filters?

2017-07-16 Thread Mark Cave-Ayland

On 16/07/17 00:08, Thomas Munro wrote:

> On Fri, Jul 14, 2017 at 11:04 PM, Magnus Hagander  wrote:
>> On Thu, Jul 13, 2017 at 9:31 AM, Thomas Munro
>>  wrote:
>>> A post on planet.postgresql.org today reminded me that a colleague had
>>> asked me to post this POC patch here for discussion.  It allows custom
>>> filters with ldapsearchprefix and ldapsearchsuffix.  Another approach
>>> might be to take a filter pattern with "%USERNAME%" or whatever in it.
>>> There's an existing precedent for the prefix and suffix approach, but
>>> on the other hand a pattern approach would allow filters where the
>>> username is inserted more than once.
>>
>>
>> Do we even need prefix/suffix? If we just make it "ldapsearchpattern", then
>> you could have something like:
>>
>> ldapsearchattribute="uid"
>> ldapsearchfilter="|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)"
>>
>> We could then always to substitution of the kind:
>> (&(attr=)())
>>
>> which would in this case give:
>> (&(uid=mha)(|(memberof=cn=Paris DBA Team)(memberof=cn=Tokyo DBA Team)))
>>
>>
>> Basically we'd always AND together the username lookup with the additional
>> filter.
> 
> Ok, so we have 3 ideas put forward:
> 
> 1.  Wrap username with ldapsearchprefix ldapsearchsuffix to build
> filter (as implemented by POC patch)
> 2.  Optionally AND ldapsearchfilter with the existing
> ldapsearchattribute-based filter (Magnus's proposal)
> 3.  Pattern-based ldapsearchfilter so that %USER% is replaced with
> username (my other suggestion)
> 
> The main argument for approach 1 is that it follows the style of the
> bind-only mode.
> 
> With idea 2, I wonder if there are some more general kinds of things
> that people might want to do that that wouldn't be possible because it
> has to include (attribute=user)... perhaps something involving a
> substring or other transformation functions (but I'm no LDAP expert,
> that may not make sense).
> 
> With idea 3 it would allow "(|(foo=%USER%)(bar=%USER%))", though I
> don't know if any such multiple-mention filters would ever make sense
> in a sane LDAP configuration.
> 
> Any other views from LDAP-users?

I've spent quite a bit of time integrating various bits of
non-PostgreSQL software to LDAP and in my experience option 3 tends to
be the standard.

Generally you find that you will be given the option to set the
attribute for the default search filter of the form
"(attribute=username)" which defaults to uid for UNIX-type systems and
sAMAccountName for AD. However there is always the ability to specify a
custom filter where the user is substituted via e.g. %u to cover all the
other use-cases.

As an example, I don't know if anyone would actually do this with
PostgreSQL but I've been asked on multiple occasions to configure
software so that users should be allowed to log in with either their
email address or username which is easily done with a custom LDAP filter
like "(|(mail=%u)(uid=%u))".


ATB,

Mark.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 15/12/14 19:08, Robert Haas wrote:

 On Sun, Dec 14, 2014 at 2:24 PM, Mark Cave-Ayland
 mark.cave-ayl...@ilande.co.uk wrote:
 However if it were posted as part of patchset with a subject of [PATCH]
 gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique
 my interest enough to review the changes to the grammar rules, with the
 barrier for entry reduced to understanding just the PostgreSQL-specific
 parts.
 
 Meh.  Such a patch couldn't possibly compile or work without modifying
 other parts of the tree.  And I'm emphatically opposed to splitting a
 commit that would have taken the tree from one working state to
 another working state into a series of patches that would have taken
 it from a working state through various non-working states and
 eventually another working state.  Furthermore, if you really want to
 review that specific part of the patch, just look for what changed in
 gram.y, perhaps by applying the patch and doing git diff
 src/backend/parser/gram.y.  This is really not hard.

Okay I agree that the suggested subject above was a little misleading,
so let me try and clarify this further.

If I were aiming to deliver this as an individual patch as part of a
patchset, my target for this particular patch would be to alter both the
bison grammar *and* the minimal underlying code/structure changes into a
format that compiles but adds no new features.

So why is this useful? Because the parser in PostgreSQL is important and
people have sweated many hours to ensure its performance is the best it
can be. By having a checkpoint with just the basic parser changes then
it becomes really easy to bisect performance issues down to just this
one particular area, rather than the feature itself.

And as per my original message it is a much lower barrier of entry for a
potential reviewer who has previous bison experience (and is curious
about PostgreSQL) to review the basic rule changes than a complete new
feature.

 I certainly agree that there are cases where patch authors could and
 should put more work into decomposing large patches into bite-sized
 chunks.  But I don't think that's always possible, and I don't think
 that, for example, applying BRIN in N separate pieces would have been
 a step forward.  It's one feature, not many.

I completely agree with you here. Maybe this isn't exactly the same for
PostgreSQL but in general for a new QEMU feature I could expect a
patchset of around 12 patches to be posted. Of those 12 patches,
probably patches 1-9 are internal API changes, code refactoring and
preparation work, patch 10 is a larger patch containing the feature, and
patches 11-12 are for tidy-up and removing unused code sections.

Even with the best patch review process in the world, there will still
be patches that introduce bugs, and the bugs are pretty much guaranteed
to be caused by patches 1-9.

Imagine a subtle bug in an internal API change which exhibits itself not
in the feature added by the patchset itself but in another mostly
unrelated part of the system; maybe this could be caused by a
pass-by-value API being changed to a pass-by-reference API in patches
1-9 and this tickles a bug due to an unexpected lifecycle heap access
elsewhere causing random data to be returned. Being able to bisect this
issue down to a single specific patch suddenly becomes very useful indeed.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 15/12/14 19:24, Andrew Dunstan wrote:

 On 12/15/2014 02:08 PM, Robert Haas wrote:
 On Sun, Dec 14, 2014 at 2:24 PM, Mark Cave-Ayland
 mark.cave-ayl...@ilande.co.uk wrote:
 However if it were posted as part of patchset with a subject of [PATCH]
 gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique
 my interest enough to review the changes to the grammar rules, with the
 barrier for entry reduced to understanding just the PostgreSQL-specific
 parts.
 Meh.  Such a patch couldn't possibly compile or work without modifying
 other parts of the tree.  And I'm emphatically opposed to splitting a
 commit that would have taken the tree from one working state to
 another working state into a series of patches that would have taken
 it from a working state through various non-working states and
 eventually another working state.  Furthermore, if you really want to
 review that specific part of the patch, just look for what changed in
 gram.y, perhaps by applying the patch and doing git diff
 src/backend/parser/gram.y.  This is really not hard.

 I certainly agree that there are cases where patch authors could and
 should put more work into decomposing large patches into bite-sized
 chunks.  But I don't think that's always possible, and I don't think
 that, for example, applying BRIN in N separate pieces would have been
 a step forward.  It's one feature, not many.

 
 +1
 
 I have in the past found the bunch of patches approach to be more that
 somewhat troublesome, especially when one gets here is an update to
 patch nn of the series and one has to try to compose a coherent set of
 patches in order to test them. A case in point I recall was the original
 Foreign Data Wrapper patchset.

In practice, people don't tend to post updates to individual patches in
that way. What happens is that after a week or so of reviews, people go
away and rework the patch and come back with a complete updated patchset
clearly marked as [PATCHv2] with the same subject line and an updated
cover letter describing the changes, so a complete coherent patchset is
always available.

Now as I mentioned previously, one of the disadvantages of splitting
patches in this way is that mailing list volume tends to grow quite
quickly - hence the use of [PATCH] filters and system identifiers in the
subject line to help mitigate this.

Whilst the volume of mail was a shock to begin with, having stuck with
it for a while I personally find that the benefits to development
outweigh the costs. Each individual project is different, but I believe
that there are good ideas here that can be used to improve workflow,
particularly when it comes to patch review.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 15/12/14 19:27, Robert Haas wrote:

 On Sun, Dec 14, 2014 at 4:53 PM, Mark Cave-Ayland
 mark.cave-ayl...@ilande.co.uk wrote:
 What I find frustrating is that I've come back from a workflow where
 I've been reviewing/testing patches within months of joining a project
 because the barrier for entry has been so low, and yet even with much
 longer experience of the PostgreSQL codebase I feel I can't do the same
 for patches submitted to the commitfest.

 And why is this? It's because while I know very specific areas of the
 code well, many patches span different areas of the source tree of which
 I have little and/or no experience, which when supplied as a single
 monolithic patch makes it impossible for me to give meaningful review to
 all but a tiny proportion of them.
 
 So, there are certainly some large patches that do that, and they
 typically require a very senior reviewer.  But there are lots of small
 patches too, touching little enough that you can learn enough to give
 them a decent review even if you've never looked at that before.  I
 find myself in that situation as a reviewer and committer pretty
 regularly; being a committer doesn't magically make you an expert on
 the entire code base.  You can (and I do) focus your effort on the
 things you know best, but you have to step outside your comfort zone
 sometimes, or you never learn anything new.

Right. Which is why I'm advocating the approach of splitting patches in
relevant chunks so that experts in those areas can review them in
parallel. And even better, if I then want to start digging into other
parts of the system as time and interest allow then I can choose to
begin by picking a subject line from a patchset and going through this
small individual patch in detail rather than a single large patch.

It has often been suggested that people learn best when studying a mix
of 80% of things they already know compared to 20% of things they don't.
At least with more granular patches people can find a comfortable ratio
of new/old material vs. a single large patch which could be 70-80% of
completely new material and therefore much more difficult to pick-up.

Another thought to ponder here is that by reviewing smaller patches
which takes less time, for the same time cost a reviewer with experience
in one specific area can in theory review and provide feedback on
another 2-3 patchsets which should help relieve patch review starvation
across patchset submissions.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 04:57, Noah Misch wrote:

 But that doesn't mean we should be turning anyone away. We should not.
 
 +1.  Some of the best reviews I've seen are ones where the reviewer expressed
 doubts about the review's quality, so don't let such doubts keep you from
 participating.  Every defect you catch saves a committer time; a review that
 finds 3 of the 10 defects in a patch is still a big help.  Some patch
 submissions waste the community's time, but it's almost impossible to waste
 the community's time by posting a patch review.
 
 Confusion may have arisen due to statements that we need more expert
 reviewers, which is also true.  (When an expert writes a sufficiently-complex
 patch, it's important that a second expert examine the patch at some point.)
 If you're a novice reviewer, your reviews do help to solve that problem by
 reducing the workload on expert reviewers.

I should add that by reducing the barrier of entry for patch review,
there is definitely potential to find common errors before a patch gets
analysed in detail by a committer.

For example I may not know a lot about certain PostgreSQL subsystems but
from a decent C coding background I can pick out certain suspicious
things in patches, e.g. potential overflows, pointer arithmetic bugs,
spelling mistakes/incorrect comments etc. and flag them to the original
submitter to double-check.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 07:33, David Rowley wrote:

 On 16 December 2014 at 18:18, Josh Berkus j...@agliodbs.com
 mailto:j...@agliodbs.com wrote:
 
  Man. You're equating stuff that's not the same. You didn't get your way
  (and I'm tentatively on your side onthat one) and take that to imply
  that we don't want more reviewers.
 
 During that thread a couple people said that novice reviewers added no
 value to the review process, and nobody argued with them then.  I've
 also been told this to my face at pgCon, and when I've tried organizing
 patch review events.  I got the message, which is why I stopped trying
 to get new reviewers.
 
 And frankly: if we're opposed to giving credit to patch reviewers, we're
 opposed to having them.
 
 
 
 I'd just like to add something which might be flying below the radar of
 more senior people. There are people out there  (ike me)  working on
 PostgreSQL more for the challenge and perhaps the love of the product,
 who make absolutely zero money out of it. For these people getting
 credit where it's due is very important. I'm pretty happy with this at
 the moment and I can't imagine any situation where not crediting
 reviewers would be beneficial to anyone.

This is exactly where I am at the moment, having previously been more
involved with the development side of PostgreSQL during the past.

Personally having a credit as a patch reviewer isn't particularly
important to me, since mail archives are good enough these days that if
people do query my contributions towards projects then I can point them
towards any reasonable search engine.

The biggest constraint on my ability to contribute is *time*.

Imagine the situation as a reviewer that I am currently on the mailing
list for two well-known open source projects and I also have a day job
and a home life to contend with.

For the spare time that I have for review, one of these projects
requires me to download attachment(s), apply them to a git tree
(hopefully it still applies), run a complete make check regression
series, try and analyse a patch which will often reference parts to
which I have no understanding, and then write up a coherent email and
submit it to the mailing list. Realistically to do all this and provide
a review that is going to be of use to a committer is going to take a
minimum of 1-2 hours, and even then there's a good chance that I've
easily missed obvious bugs in the parts of the system I don't understand
well.

For the second project, I can skim through my inbox daily picking up
specific areas I work on/are interested in, hit reply to add a couple of
lines of inline comments to the patch and send feedback to the
author/list in just a few minutes.

The obvious question is, of course, which project gets the majority
share of my spare review time?


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 10:49, Marko Tiikkaja wrote:

 On 12/16/14 11:26 AM, Mark Cave-Ayland wrote:
 On 15/12/14 19:27, Robert Haas wrote:
 So, there are certainly some large patches that do that, and they
 typically require a very senior reviewer.  But there are lots of small
 patches too, touching little enough that you can learn enough to give
 them a decent review even if you've never looked at that before.  I
 find myself in that situation as a reviewer and committer pretty
 regularly; being a committer doesn't magically make you an expert on
 the entire code base.  You can (and I do) focus your effort on the
 things you know best, but you have to step outside your comfort zone
 sometimes, or you never learn anything new.

 Right. Which is why I'm advocating the approach of splitting patches in
 relevant chunks so that experts in those areas can review them in
 parallel.
 
 I don't see how splitting patches up would help with that.  I often look
 at only the parts of patches that touch the things I've worked with
 before.  And in doing that, I've found that having the context there is
 absolutely crucial almost every single time, since I commonly ask myself
 Why do we need to do this to implement feature X?, and only looking at
 the rest of the complete patch (or patch set, however you want to think
 about it) reveals that.

I've already covered this earlier in the thread so I won't go into too
much detail, but the overall flow of the patchset is described by the
cover letter (please feel free to look at the example link I posted).

In terms of individual patches within a patchset then if the combination
of the cover letter and individual commit message don't give you enough
context then the developer needs to fix this; either the patchset has
been split at a nonsensical place or either one or other of the cover
letter and/or commit message need to be revised.

 Of course, me looking at parts of patches, finding nothing questionable
 and not sending an email about my findings (or lack thereof) hardly
 counts as review, so somebody else still has to review the actual
 patch as a whole.  Nor do I get any credit for doing any of that, which
 might be a show-stopper for someone else.  But I think that's just
 because I'm not doing it correctly.  I don't see why someone couldn't
 comment on a patch saying I've reviewed the grammar changes, and they
 look good to me.

Sure, there is always scope for doing that which is what splitting
patches and constant review encourages. In terms of the current
commitfest system, the process for review is clearly documented and as
I've pointed out in my response to David, extremely heavyweight in
comparison.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 13:37, Claudio Freire wrote:

 For the second project, I can skim through my inbox daily picking up
 specific areas I work on/are interested in, hit reply to add a couple of
 lines of inline comments to the patch and send feedback to the
 author/list in just a few minutes.
 
 Notice though that on the second project, the review is made in the
 air. You didn't build, nor ran tests, you're guessing how the code
 performs rather than seeing it perform, so the review is light
 compared to the first.

I think this doing developers in general a little injustice here. The
general rule for a submitting patch to *any* project is: i) does it
apply to the current source tree and ii) does it build and pass the
regression tests?

Maybe there is a greater incidence of this happening in PostgreSQL
compared to other projects? But in any case the project has every right
to refuse further review if these 2 simple criteria are not met. Also
with a submission from git, you can near 100% guarantee that the author
has actually built and run the code before submission. I have seen
occasions where a patch has been submitted, and a committer will send a
quick email along the lines of The patch looks good, but I've just
applied a patch that will conflict with this, so please rebase and
resubmit.

You mention about performance, but again I've seen from this list that
good developers can generally pick up on a lot of potential regressions
by eye, e.g. lookup times go from O(N) to O(N^2) without having to
resort to building and testing the code. And by breaking into small
chunks people can focus their time on reviewing the areas they are good at.

Occasionally sometimes people do get a patch ready for commit but it
fails build/test on one particular platform. In that case, the committer
simply posts the build/test failure to the list and requests that the
patchset be fixed ready for re-test. This is a very rare occurrence though.

Also you mentioned about light review but I wouldn't call it that. I
see it more as with each iteration the magnifying glass used to look at
the code gets bigger and bigger.

A lot of initial comments for the second project are along the lines of
this doesn't look right - won't this overflow on values 2G? or
you're assuming here that A occurs before B, whereas if you run with
-foo this likely won't work. And this is the area which I believe will
have the greatest benefit for PostgreSQL - making it easier to catch and
rework the obvious flaws in the patch in a timely manner before it gets
to the committer.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 13:44, David Fetter wrote:

 On Tue, Dec 16, 2014 at 11:09:34AM +, Mark Cave-Ayland wrote:
 On 16/12/14 07:33, David Rowley wrote:

 On 16 December 2014 at 18:18, Josh Berkus j...@agliodbs.com
 mailto:j...@agliodbs.com wrote:

  Man. You're equating stuff that's not the same. You didn't get your 
 way
  (and I'm tentatively on your side onthat one) and take that to imply
  that we don't want more reviewers.

 During that thread a couple people said that novice reviewers added no
 value to the review process, and nobody argued with them then.  I've
 also been told this to my face at pgCon, and when I've tried organizing
 patch review events.  I got the message, which is why I stopped trying
 to get new reviewers.

 And frankly: if we're opposed to giving credit to patch reviewers, we're
 opposed to having them.



 I'd just like to add something which might be flying below the radar of
 more senior people. There are people out there  (ike me)  working on
 PostgreSQL more for the challenge and perhaps the love of the product,
 who make absolutely zero money out of it. For these people getting
 credit where it's due is very important. I'm pretty happy with this at
 the moment and I can't imagine any situation where not crediting
 reviewers would be beneficial to anyone.

 This is exactly where I am at the moment, having previously been more
 involved with the development side of PostgreSQL during the past.

 Personally having a credit as a patch reviewer isn't particularly
 important to me, since mail archives are good enough these days that if
 people do query my contributions towards projects then I can point them
 towards any reasonable search engine.

 The biggest constraint on my ability to contribute is *time*.

 Imagine the situation as a reviewer that I am currently on the mailing
 list for two well-known open source projects and I also have a day job
 and a home life to contend with.

 For the spare time that I have for review, one of these projects
 requires me to download attachment(s), apply them to a git tree
 (hopefully it still applies), run a complete make check regression
 series, try and analyse a patch which will often reference parts to
 which I have no understanding, and then write up a coherent email and
 submit it to the mailing list. Realistically to do all this and provide
 a review that is going to be of use to a committer is going to take a
 minimum of 1-2 hours, and even then there's a good chance that I've
 easily missed obvious bugs in the parts of the system I don't understand
 well.

 For the second project, I can skim through my inbox daily picking up
 specific areas I work on/are interested in, hit reply to add a couple of
 lines of inline comments to the patch and send feedback to the
 author/list in just a few minutes.
 
 With utmost respect, you've missed something really important in the
 second that the first has, and frankly isn't terribly onerous: does an
 actual system produce working code?  In the PostgreSQL case, you can
 stop as soon as you discover that the patch doesn't apply to master or
 that ./configure doesn't work, or that the code doesn't compile:
 elapsed time = 5 minutes.  Or you can keep moving until you have made
 progress for the time you've allotted.

As per my previous email, it's generally quite rare for a developer to
post non-working code to a list (in some cases someone will send a quick
reply pointing that this needs to be rebased because it appears to
reference an old API). From what I see in PostgreSQL this mostly happens
because of bitrot between the time the patch was posted to the list and
the actual commitfest itself - so in one way the commitfest system
exacerbates this particular problem further.

 But the bigger issue, as others have pointed out, has never been a
 technical one.  It's motivating people whose time is already much in
 demand to spend some of it on reviewing.
 
 I wasn't discouraged by the preliminary patch review process or any
 feedback I got.  My absence lately has more to do with other demands
 on my time.

I am completely in agreement with you here. My approach is more along
the lines of that I know access to long periods of time to review
patches is often impractical, and so how can the review process be made
more time-efficient in order to allow you, me and other people in
similar time-limited environments to be able to participate more?


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-16 Thread Mark Cave-Ayland

On 16/12/14 15:42, Claudio Freire wrote:

 Also
 with a submission from git, you can near 100% guarantee that the author
 has actually built and run the code before submission.
 
 I don't see how. Forks don't have travis ci unless you add it, or am I 
 mistaken?

Well as I mentioned in my last email, practically all developers will
rebase and run make check on their patched tree before submitting to
the list. Hopefully there aren't hordes of developers deliberating
creating and submitting broken patches ;)

 You mention about performance, but again I've seen from this list that
 good developers can generally pick up on a lot of potential regressions
 by eye, e.g. lookup times go from O(N) to O(N^2) without having to
 resort to building and testing the code. And by breaking into small
 chunks people can focus their time on reviewing the areas they are good at.
 
 I meant performance as in running, not really performance. Sorry for
 the confusion.

Okay no worries.

 I meant, you're looking at the code and guessing how it runs, but you
 don't really know. It's easy to make assumptions that are false while
 reviewing, quickly disproved by actually running tests.
 
 A light review without ever building can fail to detect those issues.
 A heavier review patching up and building manually is too much manual
 work that could really be automated. The optimum is somewhere between
 those two extremes.

Correct. My analogy here was that people with varying abilities review
the patches at their experience level, and once low-hanging/obvious
design issues have been resolved then more experienced developers will
start to review the patch at a deeper level.

 Occasionally sometimes people do get a patch ready for commit but it
 fails build/test on one particular platform.
 
 Again, pre-review CI can take care of this.

Yes - see my next reply...

 In that case, the committer
 simply posts the build/test failure to the list and requests that the
 patchset be fixed ready for re-test. This is a very rare occurrence though.
 
 It shouldn't need to reach the repo, don't you think?

When I say repo, I mean the local repo of the tree maintainer. Currently
the tree maintainer pulls each merge request into his local tree,
performs a buildfarm test and only pushes the merge upstream once this
has passed. I guess the PostgreSQL equivalent of this would be having a
merge-pending branch on the buildfarm rather than just a post-commit
build of git master (which we see from reports to the list periodically
fails in this way) and only pushing a set of patches when the buildfarm
comes back green.

 Also you mentioned about light review but I wouldn't call it that.
 
 I've made my fair share of light reviews (not for pg though) and can
 recognize the description. It is a light review what you described.
 
 Surely not all reviews with inline comments are light reviews, but
 what you described was indeed a light review.
 
 A lot of initial comments for the second project are along the lines of
 this doesn't look right - won't this overflow on values 2G? or
 you're assuming here that A occurs before B, whereas if you run with
 -foo this likely won't work. And this is the area which I believe will
 have the greatest benefit for PostgreSQL - making it easier to catch and
 rework the obvious flaws in the patch in a timely manner before it gets
 to the committer.
 
 If you read carefully my reply, I'm not at all opposed to that. I'm
 just pointing out, that easier reviews need not result in better
 reviews.
 
 Better, easier reviews are those where the trivial reviews are
 automated, as is done in the project I linked.

Yes I can definitely agree with that. See below again:

 Formatting, whether it still applies, and whether it builds and checks
 pass, all those are automatable.

I should add that QEMU provides a branch of checkpatch.pl in the source
tree which submitters are requested to run on their patchset before
submission to the list. This catches patches that don't meet the project
style guidelines including casing, braces, trailing whitespace,
tab/space issues etc. and that's before the patch is even submitted to
the list.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-15 Thread Mark Cave-Ayland

On 15/12/14 16:28, Andres Freund wrote:

 I don't believe this really is a question of the type of project. I
 think it's more that especially the kernel has had to deal with similar
 problems at a much larger scale. And the granular approach somewhat
 works for them.

Correct. My argument was that dividing patches into smaller, more
reviewable chunks lessens the barrier for reviewers since many people
can review the individual patches as appropriate to their area of
expertise.

The benefits of this are that the many parts of the patchset get
reviewed early by a number of people, which reduces the workload on the
core developers as they only need to focus on a small number of
individual patches. Hence patches get reworked and merged much more
quickly in this way.

This is in contrast to the commitfest system where a single patch is i)
often held until the next commitfest (where bitrot often sets in) and
ii) requires the reviewer to have good knowledge all of the areas
covered by the patch in order to give a meaningful review, which
significantly reduces the pool of available reviewers.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 13/12/14 09:37, Craig Ringer wrote:

 Speaking as the originator of commitfests, they were *always* intended
 to be a temporary measure, a step on the way to something else like
 continuous integration.
 
 I'd really like to see the project revisit some of the underlying
 assumptions that're being made in this discussion:
 
 - Patches must be email attachments to a mailing list
 
 - Changes must be committed by applying a diff
 
 ... and take a look at some of the options a git-based workflow might
 offer, especially in combination with some of the tools out there that
 help track working branches, run CI, etc.
 
 Having grown used to push/pull workflows with CI integration I find the
 PostgreSQL patch workflow very frustrating, especially for larger
 patches. It's particularly annoying to see a patch series squashed into
 a monster patch whenever it changes hands or gets rebased, because it's
 being handed around as a great honking diff not a git working branch.
 
 Is it time to stop using git like CVS?

While not so active these days with PostgreSQL, I believe there is still
great scope for improving the workflow of reviewing and applying
patches. And while the commitfests were a good idea when they started,
it's obvious that in their current form they do not scale to meet the
current rate of development.

If I could name just one thing that I think would improve things it
would be submission of patches to the list in git format-patch format.
Why? Because it enables two things: 1) by definition patches are
small-chunked into individually reviewable pieces and 2) subject lines
allow list members to filter things that aren't relevant to them.

Here's an example of how the workflow works for the QEMU project which
I'm involved in, which also has similar issues in terms of numbers of
developers and supported platforms:


1) Developer submits a patch to the -devel list with git-format-patch

- For subsystems with listed maintainers in the MAINTAINERS file, the
maintainer must be CCd on the patchset (this is so that even if
developers decide to filter PATCH emails to the list, they still get the
CC copy)

- Patchsets must be iterative and fully bisectable, with clear comments
supplied in the commit message

- Any patchset with  1 patch MUST have a cover letter

- All patches have a Signed-off-by indicating that the developer has
accepted the terms of the project license

- Each subject line should be in the format [PATCH] subsystem: one line
comment to enabled people to determine its relevance


2) Other developers start reviewing patches

There are several mini-workflows that tend to occur here so I'll try and
give some popular examples.

- If a patch is a build fix, one of the core committers will verify and
apply directly to the master repository

- If a maintainer is happy with the whole patchset, they reply to the
cover letter with a Reviewed-by and a line stating which subtree the
patchset has been applied to. Maintainers add a Signed-off-by to all
patches accepted via their subtree.

- A maintainer may indicate that the patch itself is fine, but the
commit message is not clear/incorrect and should be changed before being
resubmitted

- Any member of the mailing list may reply/review an individual patch by
hitting reply in their mail client. Patch comments are included
inline. If a reviewer is happy with an individual patch, they reply to
the patch and add a Reviewed-by line; anyone can provide a review,
even if they are not a maintainer

- For complex patches, a maintainer may request that the patchset may be
split into further patchsets. In general patchsets of  20-30 patches
tend to be frowned upon, and will often immediately result in a reply
saying please break this down into smaller chunks

- A maintainer may reply and say that while the patchset in its final
form needs more work, some clean-ups introduced by the patch are ready
to be applied, e.g. please resubmit patches 1-4, 7 and 8 as a separate
patchset which can then be applied directly to a subtree


3) Patchset tidy-up and submission

After the initial review, new versions of the patchset are generally
reposted to the list within a few days.

- The patch version is included in the header with the same subject
line, e.g. [PATCHv2] Add new feature

- The cover letter contains a mini-changelog giving the changes between
v1 and v2 e.g.

v2:
  Fix spelling mistake pointed out by Peter
  Change lock ordering requested by Andreas

- Reviewed-by tags for individual patches sent to the list are
appended to the commit message for that patch before resubmission


Once a maintainer has accepted a patch into their subtree, they send a
pull request to the core maintainers to apply their trees to the main
repository. I appreciate that currently merge commits aren't used in the
PostgreSQL git repository so a pull request effectively becomes rebase
this patchset onto master and push.

So why does this help patch review? From my experience I can

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 15:51, Craig Ringer wrote:

 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 
 It's not like development on a patch series is difficult. You commit
 small fixes and changes, then you 'git rebase -i' and reorder them to
 apply to the appropriate changesets. Or you can do a 'rebase -i' and in
 'e'dit mode make amendments to individual commits. Or you can commit
 'fixup!'s that get auto-squashed.
 
 This is part of my grumbling about the use of git like it's still CVS.

I just want to add that I'm always grateful for the time that yourself
and all committers put into PostgreSQL, and my example of RLS was really
to signify big feature X rather than to pick on a particular aspect of
that patch. I apologise if you though I was in any way criticising the
work that you, or anyone else, put into this feature.

The argument I wanted to make is that if someone starts seeing a problem
with a current build of PostgreSQL and git bisects it down to a
particular commit, then smaller, iterative commits are extremely helpful
in this regard.

When searching for a needle in a haystack, there is very big difference
between a add big feature X haystack and a change struct alignment
for Y haystack, the latter which is implicit in having smaller,
iterative patchsets.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 15:57, Craig Ringer wrote:

 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 
 If I could name just one thing that I think would improve things it
 would be submission of patches to the list in git format-patch format.
 Why? Because it enables two things: 1) by definition patches are
 small-chunked into individually reviewable pieces and 2) subject lines
 allow list members to filter things that aren't relevant to them.
 
 Personally I do just that. Even though the official docs say context
 diff is preferred, I think most people now use context diff in practice,
 from 'git diff'.
 
 I'm surprised most people don't use git format-patch .
 
 Note that we can't just git am a patch from git format-patch at the
 moment, because policy is that the committer should be the Author field.
 The project would have to define a transition point where we started
 using the git Author field for the author and the separate Committer
 field, like git-am does by default. I'm not sure that'll ever actually
 happen, or that it's even desirable.

Sure, I appreciate that. The part I particularly wanted to emphasise
here was the use of a system identifier as part of the subject line,
e.g. say if there was a hypothetical patchset which sped up PostgreSQL
by 20% then you could see subjects like this:

[PATCH 1/x] lmgr: reduce lock strength for LWLock
[PATCH 2/x] wal: change lock ordering for WAL writes
[PATCH 3/x] optimiser: exit early if lock unobtainable

While these are obviously unrealistic, what I hope here is that people
with interest in these areas would take note of individual patches even
if they aren't interested in the entire patchset.

So maybe since Andreas did some recent locking work, he spots lmgr in
the subject and then makes a note to review that part of the patch.
Similary Heikki could spot wal and make a note to look at that, whilst
Tom would zero in on the optimiser changes.

The aim here is that no one person needs to sit and initially review a
complete patch before returning feedback to the developer.

 - Patchsets must be iterative and fully bisectable, with clear comments
 supplied in the commit message
 
 Fully support this one.
 
 Also in the smallest reasonable size divisions.

I should add here that the QEMU folk do tend to go to great lengths to
preserve bisectability; often intermediate compatibility APIs are
introduced early in the patchset and then removed at the very end when
the final feature is implemented.

 - If a maintainer is happy with the whole patchset, they reply to the
 cover letter with a Reviewed-by and a line stating which subtree the
 patchset has been applied to. Maintainers add a Signed-off-by to all
 patches accepted via their subtree.
 
 Sounds a lot like the kernel workflow (though in practice it seems like
 the kernel folks tend bypass all this mailing list guff and just use
 direct pulls/pushes for lots of things).

Yes indeed - mainly from the people on the KVM module side. Again I'm
not saying that this workflow is correct for PostgreSQL, I was trying to
use this an example to explain *why* the workflow is done in this manner
and how it helps developers such as myself.

 - Any member of the mailing list may reply/review an individual patch by
 hitting reply in their mail client. Patch comments are included
 inline. If a reviewer is happy with an individual patch, they reply to
 the patch and add a Reviewed-by line; anyone can provide a review,
 even if they are not a maintainer
 
 This is fun with many mail clients that like to linewrap text bodies
 and don't permit inline replies to attached text.

Wow really? I can't say I've seen this in practice for a long time but I
defer to your experience here.

 6) Smaller patches are applied more often

 By breaking large patches down into smaller chunks, even if the main
 feature itself isn't ready to be applied to the main repository then a
 lot of the preceding patches laying down the groundwork can.
 
 I think PostgreSQL does OK on smaller patches - at least sometimes.
 There can be a great deal of bikeshedding and endless arguments,
 back-and-forth about utility, in-tree users, etc, but no formal process
 is ever going to change that. The process of getting an uncontroversial
 patch into Pg is generally painless, if often rather slow unless a
 committer sees it and commits it immediately upon it being posted.

I agree that trivial patches do tend to get picked up reasonably
quickly. It strikes me that it may prevent patches slipping through the
list if someone were officially nominated as a trivial patch monitor,
i.e. someone for developers to ping if their patch has been ignored
but it sounds like this is not such an issue in practice.

 The benefits here are that people with large out-of-tree patches 
 
 I'm not sure we have all that many people in that category - though I'm
 a part of a team that most certainly does fit the bill.
 
 Even the small patches for BDR have been challenging and often
 contentious

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 17:05, Tom Lane wrote:

 Craig Ringer cr...@2ndquadrant.com writes:
 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.

The way in which this is normally handled is via the cover letter which
is going to be nicely captured in the mail archives; typically a cover
letter for a patchset consists of several paragraphs along the lines of
patches 1-3 do some re-org work, patch 4 reworks the Y API for new
feature X implemented in patches 9-12.

As an example take a look at the cover letter for this patch submitted
over the past few days:
http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg01990.html. It
seems to me that this would give the level of detail which you are
looking for.

I agree that definitely in some cases patches could be broken into too
fine pieces, but then I think it's perfectly okay for committers to
turn around and request the series be re-submitted in more sensible
chunks if they feel things have gone too far the other way.

 Also, speaking of the larger picture: the current Postgres revision
 history amounts to 37578 commits (as of sometime yesterday) --- and that's
 just in the HEAD branch.  If we'd made an effort to break feature patches
 into bite-size chunks like you're recommending here, we'd probably have
 easily half a million commits in the mainline history.  That would not be
 convenient to work with, and I really doubt that it would be more useful
 for git bisect purposes, and I'll bet a large amount of money that most
 of them would not have had commit messages composed with any care at all.

Having worked on a few kernel patches myself, git is surprisingly
effective on huge repositories once the cache is warmed up so I don't
feel this would be an issue with a PostgreSQL git repository which will
be many magnitudes smaller.

In terms of commit messages, I don't know if you missed that part of my
original response to Craig but it's considered normal for a maintainer
to reject a patch because of an incorrect/irrelevant commit message.

So if I submitted a patch that fixed a particular bug but you as a
committer weren't happy with the commit message, then I'm perfectly okay
with you asking me to resubmit with updated/corrected patch message wording.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 17:30, Andrew Dunstan wrote:

 On 12/14/2014 12:05 PM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 On 12/14/2014 10:35 PM, Mark Cave-Ayland wrote:
 Compare this to say, for example, huge patches such as RLS.
 I specifically objected to that being flattened into a single monster
 patch when I saw that'd been done. If you look at my part in the work on
 the row security patch, while I was ultimately unsuccessful in getting
 the patch mergeable I spent quite a bit of time splitting it up into a
 logical patch-series for sane review and development. I am quite annoyed
 that it was simply flattened back into an unreviewable, hard-to-follow
 blob and committed in that form.
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.

 Also, speaking of the larger picture: the current Postgres revision
 history amounts to 37578 commits (as of sometime yesterday) --- and
 that's
 just in the HEAD branch.  If we'd made an effort to break feature patches
 into bite-size chunks like you're recommending here, we'd probably have
 easily half a million commits in the mainline history.  That would not be
 convenient to work with, and I really doubt that it would be more useful
 for git bisect purposes, and I'll bet a large amount of money that most
 of them would not have had commit messages composed with any care at all.
 
 I have tried to stay away from this thread, but ...
 
 I'm also quite dubious about this suggested workflow, partly for the
 reasons Tom gives, and partly because it would constrain the way I work.
 I tend to commit with little notes to myself in the commit logs, notes
 that are never intended to become part of the public project history. I
 should be quite sad to lose that.

Interestingly enough, I tend to work in a very similar way to this. When
submitting patches upstream, I tend to rebase on a new branch and then
squash/rework as required.

One thing I do tend to find is that once I start rebasing the patches
for upstream, I find that many of my personal notes can be rewritten as
part of the patch comment (I would like to think that comments useful to
myself will be useful to other developers some day). Once the rebase is
complete, often I find that I have no non-public comments left so that
problem becomes almost non-existent.

 As for using git bisect, usually when I do this each iteration is quite
 expensive. Multiplying the number of commits by a factor between 10 and
 100, which is what I think this would involve, would just make git
 bisect have to do between 3 and 7 more iterations, ISTM. That's not a win.

I find that this isn't too bad in practice. If you think about
monolithic patches during a commitfest, you can imagine that most of
them will touch one of the core .h files which will require most things
to be rebuilt once applied during bisection.

With smaller iterative patchsets, each patch tends to only touch a
handful of files and so make install generally only needs to rebuild
and link a very small number of files which is reasonably quick. Since
all of the changes for global header files are contained in 1-2 patches
then the number of complete rebuilds isn't as many as you might expect.

 On the larger issue, let me just note that I don't believe we have what
 is fundamentally a technological problem, and while technological
 changes can of course sometimes make things easier, they can also blind
 us to the more basic problems we are facing.

Absolutely. I firmly believe that the right tools for the job are
available, it's really trying to find a workflow solution that works
well for everyone involved in the project.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 18:24, Peter Geoghegan wrote:

 On Sun, Dec 14, 2014 at 9:05 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 TBH, I'm not really on board with this line of argument.  I don't find
 broken-down patches to be particularly useful for review purposes.  An
 example I was just fooling with this week is the GROUPING SETS patch,
 which was broken into three sections for no good reason at all.  (The
 fourth and fifth subpatches, being alternative solutions to one problem,
 are in a different category of course.)  Too often, decisions made in
 one subpatch don't make any sense until you see the larger picture.
 
 It sounds like they didn't use the technique effectively, then. I
 think it will be useful to reviewers that I've broken out the
 mechanism through which the ON CONFLICT UPDATE patch accepts the
 EXCLUDED.* pseudo-alias into a separate commit (plus docs in a
 separate commit, as well as tests) - it's a non-trivial additional
 piece of code that it wouldn't be outrageous to omit in a release, and
 isn't particularly anticipated by earlier cumulative commits. Maybe
 people don't have a good enough sense of what sort of subdivision is
 appropriate yet. I think that better style will emerge over time.

For me this is a great example of how to get more developers involved in
patch review. Imagine that I'm an experienced developer with little
previous exposure to PostgreSQL, but with a really strong flex/bison
background.

If someone posts a patch to the list as a single grouping sets patch,
then I'm going to look at that and think I have no way of understanding
this and do nothing with it.

However if it were posted as part of patchset with a subject of [PATCH]
gram.y: add EXCLUDED pseudo-alias to bison grammar then this may pique
my interest enough to review the changes to the grammar rules, with the
barrier for entry reduced to understanding just the PostgreSQL-specific
parts.

What should happen over time is that developers like this would earn the
trust of the committers, so committers can spend more time reviewing the
remaining parts of the patchset. And of course the project has now
engaged a new developer into the community who otherwise may not have
not participated.

 Of course, if you still prefer a monolithic commit, it's pretty easy
 to produce one from a patch series. It's not easy to go in the other
 direction, though.

Agreed.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Commitfest problems

2014-12-14 Thread Mark Cave-Ayland

On 14/12/14 20:07, Craig Ringer wrote:

 On 12/15/2014 02:46 AM, Mark Cave-Ayland wrote:
 Interestingly enough, I tend to work in a very similar way to this. When
 submitting patches upstream, I tend to rebase on a new branch and then
 squash/rework as required.
 
 Same here, and I find it works really well ... when I do it properly.
 
 I usually have a private development branch that's full of fixup! 
 commits, WIPs, awful commit messages, etc.
 
 Then I have the tree that's been rebased, squashed, and tided up.
 
 I periodically tidy my latest work and replace the old clean tree with
 it, then start my new development branch on top of the new clean tree.
 
 This gives me a somewhat nice looking, well ordered patch series to work
 on top of, while retaining the flexibility to do lots of quick fixes etc.
 
 Admittedly, sometimes the development tree gets a bit large and it takes
 a while before I give it a proper cleanup. My current project being very
 much in that category. Still - it works well in general.

That describes my workflow fairly well too.

 I find that this isn't too bad in practice. If you think about
 monolithic patches during a commitfest, you can imagine that most of
 them will touch one of the core .h files which will require most things
 to be rebuilt once applied during bisection.
 
 It's not build time, it's test-run time. Especially if you can't
 automate the test, or it isn't practical to.
 
 That's a legitimate concern - but I don't think we'd see a blowout of
 patch counts to quite the same degree.

At the end of the day, each project finds its own level as to how
complex individual patches should be and what should comprise a
patchset. Over the past couple of years the QEMU process has evolved
into its current form as maintainers and developers have figured out
what works best for them, and I don't see that PostgreSQL would be any
different in this respect - it takes time to adapt to a new workflow and
get it right.

Having worked on various parts for PostGIS for around 10 years, I've had
my head stuck into various parts of the GiST code and have got to know a
few parts of it really well.

What I find frustrating is that I've come back from a workflow where
I've been reviewing/testing patches within months of joining a project
because the barrier for entry has been so low, and yet even with much
longer experience of the PostgreSQL codebase I feel I can't do the same
for patches submitted to the commitfest.

And why is this? It's because while I know very specific areas of the
code well, many patches span different areas of the source tree of which
I have little and/or no experience, which when supplied as a single
monolithic patch makes it impossible for me to give meaningful review to
all but a tiny proportion of them.

I believe that this is one of the main reasons why people struggle to
find patch reviewers, with the consequence being that the majority of
work falls back onto the CF manager and the committers which is why we
have the current situation.


ATB,

Mark.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-07-02 Thread Mark Cave-Ayland


On 02/07/14 08:36, Andres Freund wrote:


On 2014-07-01 20:21:37 -0400, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-07-01 23:21:07 +0100, Mark Cave-Ayland wrote:

Also if you're struggling for Sun buildfarm animals, recent versions of QEMU
will quite happily install and run later versions of 32-bit Solaris over
serial, and 2.0 even manages to give you a cgthree framebuffer for the full
experience.



Well. I have to admit I'm really not interested in investing that much
time in something I've no stake in. If postgres developers have to put
emulated machines to develop features something imo went seriously
wrong. That's more effort than at least I'm willing to spend.


Perhaps more to the point, I have no faith at all that an emulator will
mimic multiprocessor timing behavior to the level of detail needed to
tell whether memory-barrier-related logic works.  See the VAX discussion
just a couple days ago.


Well, it would allow us to see wether fixed stuff actually compiles and
runs - that's not nothing. The biggest problem with fixing stuff like
armv5, sparc8, whatever is that it requires adding stuff to our inline
assembly. It's easy to accidentally make it not compile, but
comparatively harder to make the behaviour even worse than before.


The point I wanted to make was that there are certain applications for 
which SPARCv8 is still certified for particular military/aerospace use. 
While I don't use it myself, some people are still using it enough to 
want to contribute QEMU patches.


In terms of QEMU, the main reason for mentioning it was that if the 
consensus were to keep SPARCv8 support then this could be one possible 
way to provide basic buildfarm support (although as Tom rightly points 
out, there would need to be testing to build up confidence that the 
emulator does the right thing in a multiprocessor environment).



ATB,

Mark.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spinlocks and compiler/memory barriers

2014-07-01 Thread Mark Cave-Ayland


On 01/07/14 11:04, Andres Freund wrote:


Since we have a Sun Studio machine in the buildfarm, we shouldn't give
up on SPARC completely, but maybe we should only add the cases for
sparcv8+ and above?  That at least has some chance of getting tested.


That we have code for sparcv7 is ridiculous btw. Sparcv8 (not 8+/9) is
from 1992. v7/sun4c been dropped from solaris 8 if I see that correctly.

I agree that v8 (in contrast to v8+ aka v9 in 32bit mode) support has to
go as well. I'd personally vote for backpatching a note to
installation.sgml saying that it's probably not working, and not do
anything else there. That means we also should replace the ldstub by cas
in the the gcc inline assembly - but we have buildfarm members for that,
so it's not too bad.


Being involved in QEMU SPARC development, I can tell you that patches 
are still actively being received for SPARCv8. The last set of CPU 
patches were related to fixing bugs in the LEON3, a 32-bit SPARC CPU 
which is available in hardened versions certified for extreme 
environments such as military and space. I'd hate to find out that they 
switched to another database because they couldn't upgrade PostgreSQL on 
the ISS ;)


Also if you're struggling for Sun buildfarm animals, recent versions of 
QEMU will quite happily install and run later versions of 32-bit Solaris 
over serial, and 2.0 even manages to give you a cgthree framebuffer for 
the full experience.



ATB,

Mark.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Interrupting long external library calls

2012-05-16 Thread Mark Cave-Ayland


Hi all,

One of the issues we've been looking at with PostGIS is how to interrupt 
long-running processing tasks in external libraries, particularly GEOS. 
After a few tests here, it seems that even the existing SIGALRM handler 
doesn't get called if statement_timeout is reached when in an external 
library, e.g. with PostgreSQL 9.0/PostGIS 2.0 trunk/GEOS:


pg90@kentang:~$ ./rel/bin/postgres --single -D data postgis

PostgreSQL stand-alone backend 9.0.7
backend create database foo;
backend drop database foo;
backend show statement_timeout;
 1: statement_timeout   (typeid = 25, len = -1, typmod = -1, 
byval = f)


 1: statement_timeout = 500ms (typeid = 25, len = -1, typmod 
= -1, byval = f)


backend set statement_timeout = '10ms';
backend create database foo;
ERROR:  canceling statement due to statement timeout
STATEMENT:  create database foo;

(note the following command takes about 3-4s to run but doesn't get 
cancelled)


backend select st_segmentize('LINESTRING(0 0, 10 0)', 1e-12);
ERROR:  invalid memory alloc request size 1073741824

Is there a way in which we can interrupt long-running external library 
calls that are unable to call the PostgreSQL CHECK_FOR_INTERRUPTS() 
macro directly? If it is possible for standard alarms to work in this 
way then something along the lines of an alarm within PostgreSQL itself 
that manually invokes CHECK_FOR_INTERRUPTS() every 1s while the query is 
running could potentially solve the issue for us.



Many thanks,

Mark.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interrupting long external library calls

2012-05-16 Thread Mark Cave-Ayland


On 16/05/12 11:39, Heikki Linnakangas wrote:


One of the issues we've been looking at with PostGIS is how to interrupt
long-running processing tasks in external libraries, particularly GEOS.
After a few tests here, it seems that even the existing SIGALRM handler
doesn't get called if statement_timeout is reached when in an external
library, e.g. with PostgreSQL 9.0/PostGIS 2.0 trunk/GEOS:


If you interrupt an external library call, it might leave memory in an
inconsistent state, or some other such thing. It's not generally safe to
interrupt arbitrary 3rd party code.

However, if you're absolutely positively sure that the library function
can tolerate that, you can set ImmediateInterruptOK = true before
calling it. See e.g PGSemaphoreLock() on how that's done before starting
to sleep on a semapgore.


Hi Heikki,

Yes that appears to work fine on the surface - just testing now to 
determine what state everything is left in when queries are aborted 
prematurely.



Many thanks,

Mark.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64

2011-12-14 Thread Mark Cave-Ayland


On 14/12/11 13:59, Andrew Dunstan wrote:


Hmm. Yeah, if I remove -O0 and instead set CFLAGS to -ffloat-store the
error goes away.

So, would we want to use that just for this file, or for the whole build?


Well the latest documentation for gcc gives 2 options: -ffloat-store and 
-fexcess-precision=style which are documented at 
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html#Optimize-Options.


For PostGIS we only applied the flags for specific files, because of 
severe performance warnings in older versions of the gcc documentation 
such as this: http://www.delorie.com/gnu/docs/gcc/gcc_10.html. I have no 
idea whether these warnings still hold true or not for more modern 
compiler versions.


ISTM that the best solution would be to determine whether or not 
-fexcess-precision=standard does the right thing (and also determine the 
performance hit) or take a look at the excess precision notes in the 
older documentation to see if parts of the code can be rearranged to 
eliminate the effect.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] PostgreSQL fails to build with 32bit MinGW-w64

2011-12-12 Thread Mark Cave-Ayland


On -10/01/37 20:59, Andrew Dunstan wrote:


This is apparently an optimization bug in the compiler. If I turn
optimization off (CFLAGS=-O0) it goes away. Ick.

So at the moment I'm a bit blocked. I can't really file a bug because
the
compiler can't currently be used to build postgres, I don't have time to
construct a self-contained test case, and I don't want to commit
changes to
enable the compiler until the issue is solved.

If we're talking about adding support for a previously-unsupported
configuration, it seems to me that it would be fine to commit a patch
that made everything work, but for the compiler bug. We could refrain
from stating that we officially support that configuration until the
compiler bug is fixed, or even document the existence of the bug. We
can't be responsible for other people's broken code, but I don't
necessarily see that as a reason not to commit a prerequisite patch.
Otherwise, as you say, there's a chicken-and-egg problem, and who does
that help?




Yeah, fair enough. I'll work on that.


Definitely do this (and then file a bug report with the project). I've 
worked with both Kai and NightStrike from the MingW-W64 project to fix 
previous bugs, and as long as they can build the offending source 
themselves then they are very helpful and quick to respond.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improved parallel make support

2010-11-11 Thread Mark Cave-Ayland


Dave Page wrote:


Thanks - installed.


As a matter of policy, I do not want to drop support for a FOSS build tool
chain on Windows if at all avoidable.


Nor I, however I only have limited time to dedicate to that goal.


One thing to think about is that since PostGIS uses MingW/PGXS on 
Windows, we use MingW builds in order to generate the Makefiles we need 
(there is no native MSVC build for Windows). Not being able to do this 
would cause us great inconvenience :(



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improved parallel make support

2010-11-11 Thread Mark Cave-Ayland


Dave Page wrote:


On Thu, Nov 11, 2010 at 4:51 PM, Andrew Dunstan and...@dunslane.net wrote:

Interesting. Doesn't EDB's PostgresPlus package include PostGIS, and isn't
its Windows version build with MSVC?


Yes - it's a PITA as we have to have a dummy build of the server in
mingw/msys to compile PostGIS and Slony. We're probably going to be
looking at that in the not-to-distant future as we want 64bit builds
of both and will be using VC++.


Just for the record, a lot of work was done in the 1.4 release series to 
make MSVC builds possible, and indeed several people have reported success:


http://postgis.refractions.net/pipermail/postgis-devel/2009-March/005102.html
http://postgis.refractions.net/pipermail/postgis-devel/2010-September/010299.html

The two main outstanding issues as I see it are:

1) The GTK-based GUI for shp2pgsql (although if someone wanted to 
sponsor work to convert to wxWidgets to bring us in line with pgAdmin, 
that would be strongly considered).


2) Maintenance of the MSVC build system. So far we have had some 
complaints about not using MSVC, but then no-one has stepped up to 
maintain the build system for it. Forcing all existing developers to 
suddenly start maintaining the Windows build is a total non-starter.


My hope is that one day CMake will enable us to come up with a universal 
solution, but we're some way from that yet.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-18 Thread Mark Cave-Ayland


Paul Ramsey wrote:


So what kind of data structure would you like for a typmod?


I'm a primitive enough beast that just having 64-bits would make me
happy. As a general matter though, a bytea?

P


For my vote, I'd prefer either the Oid of a custom type or an array of 
Oid, Datum pairs - i.e. something we can extend in the future if required.


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] knngist - 0.8

2010-10-18 Thread Mark Cave-Ayland


David Fetter wrote:


For my vote, I'd prefer either the Oid of a custom type or an array
of Oid, Datum pairs - i.e. something we can extend in the future if
required.


This sounds a lot like a foreign key to another table.  Are you not
proposing doing that because of performance considerations?

Cheers,
David.


Well, in PostGIS a typmod contains 3 pieces of information:

1) the SRID
2) the dimension
3) the geometry type

The SRID is technically already a foreign key into another table, with 
dimension and SRID as other information. At the moment, we bit-pack the 
dimension and geometry type into the SRID and use that as the typmod but 
this only leaves 21 bits IIRC for the SRID. The additional complication 
is that SRIDs at the higher end of the range are allowed for anyone to 
use, and so people may have their own customised spheroids defined in 
this region of the table.


If we had a foreign key into another table, we'd need to ensure that no 
one could tamper with it as otherwise all chaos would break lose, e.g. 
breaking the geometry type constraint on a column. Heck, we even have 
people deleting the geometry_columns table sometimes because they are 
not aware of what it does. By storing this information in the PG catalog 
then this can't happen, plus the information is available easily in 
Form_pg_attribute without having to implement our own cache, with its 
own related problems such as how/when to invalidate etc.


There is also a chance that we'd want to include additional information 
in the future related to geometry validity, for example, which would 
mean further reducing the range allowed within the spatial_ref_sys table 
in its existing form.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3

2010-07-28 Thread Mark Cave-Ayland


Andrew Dunstan wrote:

The real problem has nothing to do with any of the analysis, as you say. 
It is this: they have an override file for PGXS and it uses 
$(mkinstalldirs) which we got rid of about a year ago. So apparently 
they haven't been testing much against any of our alphas or betas or 
they would have seen this long ago. The correct fix is to do the 
following in the PostGIS source root:


   sed -i -e 's/mkinstalldirs/MKDIR_P/' postgis/Makefile.pgxs

cheers

andrew


Hmmm that's totally wrong - the override in Makefile.pgxs should only 
ever be loaded for PostgreSQL 8.3 and 8.4, and not PostgreSQL 9.0 since 
it already has the correct installation paths.


What I suspect is that you're actually getting bitten by this:

http://markmail.org/message/k7iolbazhrqhijfk#query:pg_config%20jun%202007+page:1+mid:rqk6ux2e7npqbrzf+state:results

Or, in other words, configure is picking up the wrong pg_config. Since 
the path fix in the thread was not backported to  8.3, then the 
presence of an another pg_config for PostgreSQL  8.3 in PATH will break 
things :(



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostGIS vs. PGXS in 9.0beta3

2010-07-28 Thread Mark Cave-Ayland


Andrew Dunstan wrote:


No, the configure test is wrong. Here's what's in configure.ac:


   dnl Temporary hack until minimum PostgreSQL version is 8.5:
   dnl If PostgreSQL  8.5 is detected, trigger the inclusion of the
   new versioned PGXS targets
   PGXSOVERRIDE=0
   if test ! $PGSQL_MINOR_VERSION -ge 5; then
   PGXSOVERRIDE=1
   fi


Of course, we don't have any such thing as 8.5.


Ah wait - I see. The fix is in already in the 1.5 branch, it just hasn't 
hit a release yet :(


http://trac.osgeo.org/postgis/changeset/5421/branches/1.5/configure.ac

Looks like we need to push a new release in time for 9.0...


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why are these modules built without respecting my LDFLAGS?

2010-06-28 Thread Mark Cave-Ayland


Tom Lane wrote:


Should we try to make that a bit more consistent, and if so how?
The shenanigans in Makefile.shlib would get a lot simpler if we said
that shlib links always include LDFLAGS *plus* LDFLAGS_SL, but I would
think that that would carry substantial risk of breakage.  Surely there
are cases where linker switches are appropriate for making executables
but not shlibs.  Perhaps we should set up three variables instead of
two, viz
LDFLAGS = switches for linking both executables and shlibs
LDFLAGS_EX = extra switches for linking executables only
LDFLAGS_SL = extra switches for linking shlibs only
Then we could get rid of that untrustworthy hack for extracting -L
switches ...


While we're on the subject... this reminds me of another issue that's 
come up a few times on the PostGIS mailing lists.


AFAICT pg_config doesn't have a method for generating LDFLAGS for libpq 
client applications, only backend libraries with pg_config --libs. 
Normally we just go for -lpq but that doesn't always seem to work on 
platforms where you need to explicitly give all libpq dependencies 
during link time, e.g. 
http://postgis.refractions.net/pipermail/postgis-users/2010-April/026349.html.


Would LDFLAGS_EX in this case be what we need? If so, could it be 
exposed via a pg_config --libpq option or similar?



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [CFReview] Red-Black Tree

2010-02-05 Thread Mark Cave-Ayland


Teodor Sigaev wrote:


I would like to see point #2 of the following email addressed before
commit.  As things stand, it is not clear (at least to me) whether
this is a win.


Reimplementation of ginInsertRecordBA reduces difference of HEAD and 
HEAD+rbtree in regular case.

Test suite is taken from http://www.sai.msu.su/~megera/wiki/2009-04-03:

SEQ: SELECT array_to_string(ARRAY(select '' || a || '.' || b from
generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM
generate_series(1,10) a;
RND: SELECT array_to_string(ARRAY(select '' || random() from
generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM
generate_series(1,10) a;

Times in seconds:
 HEAD  0.9   0.11
SEQ   130  113111
RND11.4 12.6   11.5

The ides was to change order of insertion - now insertion order 
decreases number of rebalancing.


Oleg's test (http://www.sai.msu.su/~megera/wiki/rbtree_test) are made 
with v0.10 which is differ from 0.11 only by comments around 
ginInsertRecordBA()


Here is a quick comparison between the current 0.11 patch against my 
original 0.7 patch when building Oleg's simple data. (Note: due to time 
constraints, this is just a single run to get a feel for performance)



0.7 patch
=

rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin);
CREATE INDEX
Time: 1910741.352 ms

rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout);
CREATE INDEX
Time: 1647609.300 ms


0.11 patch
==

rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin);
CREATE INDEX
Time: 1864013.526 ms

rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout);
CREATE INDEX
Time: 1661200.454 ms


HTH,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [CFReview] Red-Black Tree

2010-02-04 Thread Mark Cave-Ayland


Robert Haas wrote:


Maybe we are now getting to the heart of the confusion.  Mark wrote in
his email: Unfortunately I was not really able to reproduce the RND
(teodor's) dataset, nor the random array test as the SQL used to test
the implementation was not present on the page above.  The SQL for
the fixed-length tests is posted, but the SQL for the variable length
test is not - so Mark was just guessing on that one.

Or am I just totally confused?

...Robert


No, that's correct. In the Repeat test with 100,000 identical records 
varying array length (len) section, it's fairly easy to substitute in 
the varying values of len where len = 3, 30 and 50. As documented in my 
review email I had a guess at generating the contents of RND (teodor's) 
column with this query:


select ARRAY(select generate_series(1, (random() * 100)::int)) as arand 
into arrrand from generate_series(1,10) b;


However, unlike the other figures this is quite a bit different from 
Oleg/Teodor's results which make me think this is the wrong query (3.5s 
v 9s). Obviously Robert's concern here is that it is this column that 
shows one of the largest performance decreases compared to head.


I've also finished benchmarking the index creation scripts yesterday on 
Oleg's test dataset from 
http://www.sai.msu.su/~megera/postgres/files/links2.sql.gz. With 
maintenance_work_mem set to 256Mb, the times I got with the rbtree patch 
applied were:



rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin);
CREATE INDEX
Time: 1910741.352 ms

rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout);
CREATE INDEX
Time: 1647609.300 ms


Without the patch applied, I ended up having to shutdown my laptop after 
around 90 mins before the first index had even been created. So there is 
a definite order of magnitude speed increase with this patch applied.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest Status Summary - 2010-02-03

2010-02-04 Thread Mark Cave-Ayland


Robert Haas wrote:


Here's an overview of where we stand with the remaining 14 patches,
according to my best understanding of the situation.



* rbtree - I have done a lot of work reviewing this, and Mark
Cave-Ayland has done some work on it, too.  But there are some
unanswered performance questions that need to be addressed before
commit.  This is another one that could really use some more eyes on
it.



* knngist - The third remaining big patch.  Mark Cave-Ayland
volunteered to review this one, too, but so far no review has been
posted on -hackers.  I know that the PostGIS folks would really like
to have this, but time is growing short.


Yes. I'm currently working on the knngist patch now, although sadly work 
got side-tracked onto the rbtree patch since it is a requirement for the 
knngist patch.


Now that the rbtree patch is in reasonably good shape, I intend to focus 
the rest of my time working on the knngist patch exclusively.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] development setup and libdir

2010-01-31 Thread Mark Cave-Ayland


Ivan Sergio Borgonovo wrote:


Of course I can write a script that can workaround this.
It seems that the only thing missing is that pgxs 8.3 used to
prefix .so with lib and then rename them at install time, but pgxs
8.4 build them directly without prefix.
I'm just speculating this is the issue and it is the only one I
still have to solve... but... what's going to happen next release?
Wouldn't it be better if make install could install stuff where I
ask so I could put modules in different places *even* for the same
installation of postgres?


FWIW the soon-to-be-released PostGIS 1.5 has an out of place 
regression testing feature that allows PostGIS to be built using PGXS 
and regression tested without putting anything in the PostgreSQL 
installation directory itself.


It's a little bit of a hack, but take a look at the PGXS makefile and 
the regression makefile to see how it all works:


http://trac.osgeo.org/postgis/browser/trunk/postgis/Makefile.in
http://trac.osgeo.org/postgis/browser/trunk/regress/Makefile.in


HTH,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [CFReview] Red-Black Tree

2010-01-29 Thread Mark Cave-Ayland

)) as a50 into arr50 from 
generate_series(1,10) b;


test=# create index arr50_idx on arr50 using gin (a50);
CREATE INDEX
Time: 3306.074 ms


iv) len=random

select ARRAY(select generate_series(1, (random() * 100)::int)) as arand 
into arrrand from generate_series(1,10) b;


test=# create index arrrand_idx on arrrand using gin (arand);
CREATE INDEX
Time: 4725.556 ms


After patch:

i) len=3

select ARRAY(select generate_series(1,3)) as a3 into arr3 from 
generate_series(1,10) b;


test=# create index arr3_idx on arr3 using gin (a3);
CREATE INDEX
Time: 299.090 ms


ii) len=30

select ARRAY(select generate_series(1,30)) as a30 into arr30 from 
generate_series(1,10) b;


test=# create index arr30_idx on arr30 using gin (a30);
CREATE INDEX
Time: 2828.424 ms


iii) len=50

select ARRAY(select generate_series(1,50)) as a50 into arr50 from 
generate_series(1,10) b;


test=# create index arr50_idx on arr50 using gin (a50);
CREATE INDEX
Time: 3984.456 ms


iv) len=random

select ARRAY(select generate_series(1, (random() * 100)::int)) as arand 
into arrrand from generate_series(1,10) b;


test=# create index arrrand_idx on arrrand using gin (arand);
CREATE INDEX
Time: 3514.972 ms


Summary
===

I believe Robert has done a good deal of the groundwork required to get 
this patch ready for inclusion. With the current version, I was able to 
see a measurable performance improvement in some test cases with no 
significant performance regression. It would have been nice to be able 
to reproduce the whole set of test cases but this was not possible from 
the information specified.


With perhaps some minor tweaks to some of the names and a rework of the 
else clause in ginInsertEntry(), I feel this patch is reasonably close 
to commit.


I shall now continue my review of the associated KNNGiST patch.


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index 954884d..e043713 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -22,15 +22,60 @@
 #define DEF_NENTRY 2048
 #define DEF_NPTR   4
 
+static void*
+ginAppendData(void *old, void *new, void *arg)
+{
+   EntryAccumulator*eo = (EntryAccumulator*)old,
+   *en = (EntryAccumulator*)new;
+
+   BuildAccumulator*accum = (BuildAccumulator*)arg;
+
+   if (eo-number = eo-length)
+   {
+   accum-allocatedMemory -= GetMemoryChunkSpace(eo-list);
+   eo-length *= 2;
+   eo-list = (ItemPointerData *) repalloc(eo-list,
+   
sizeof(ItemPointerData) * eo-length);
+   accum-allocatedMemory += GetMemoryChunkSpace(eo-list);
+   }
+
+   /* If item pointers are not ordered, they will need to be sorted. */
+   if (eo-shouldSort == FALSE)
+   {
+   int res;
+
+   res = compareItemPointers(eo-list + eo-number - 1, en-list);
+   Assert(res != 0);
+
+   if (res  0)
+   eo-shouldSort = TRUE;
+   }
+
+   eo-list[eo-number] = en-list[0];
+   eo-number++;
+
+   return old;
+}
+
+static int
+cmpEntryAccumulator(const void *a, const void *b, void *arg)
+{
+   EntryAccumulator*ea = (EntryAccumulator*)a;
+   EntryAccumulator*eb = (EntryAccumulator*)b;
+   BuildAccumulator*accum = (BuildAccumulator*)arg;
+
+   return compareAttEntries(accum-ginstate, ea-attnum, ea-value,
+eb-attnum, eb-value);
+}
+
 void
 ginInitBA(BuildAccumulator *accum)
 {
-   accum-maxdepth = 1;
-   accum-stackpos = 0;
-   accum-entries = NULL;
-   accum-stack = NULL;
accum-allocatedMemory = 0;
accum-entryallocator = NULL;
+   accum-tree = rb_create(cmpEntryAccumulator, ginAppendData, NULL, 
accum);
+   accum-iterator = NULL;
+   accum-tmpList = NULL;
 }
 
 static EntryAccumulator *
@@ -48,36 +93,6 @@ EAAllocate(BuildAccumulator *accum)
 }
 
 /*
- * Stores heap item pointer. For robust, it checks that
- * item pointer are ordered
- */
-static void
-ginInsertData(BuildAccumulator *accum, EntryAccumulator *entry, ItemPointer 
heapptr)
-{
-   if (entry-number = entry-length)
-   {
-   accum-allocatedMemory -= GetMemoryChunkSpace(entry-list);
-   entry-length *= 2;
-   entry-list = (ItemPointerData *) repalloc(entry-list,
-   
sizeof(ItemPointerData) * entry-length);
-   accum-allocatedMemory += GetMemoryChunkSpace(entry-list);
-   }
-
-   if (entry

Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS

2010-01-04 Thread Mark Cave-Ayland


Tom Lane wrote:


Why do DOCS still go into doc/contrib?  Shouldn't that become
doc/$MODULEDIR for consistency?


Hmmm it looks as if the code was correct but I missed the comment at the 
top of the file. Sorry for the confusion - revised version attached.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index a83dad3..76b585f 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -19,11 +19,14 @@
 #
 #   MODULES -- list of shared objects to be build from source file with
 # same stem (do not include suffix in this list)
-#   DATA -- random files to install into $PREFIX/share/contrib
-#   DATA_built -- random files to install into $PREFIX/share/contrib,
+#   MODULEDIR -- subdirectory under contrib into which DATA and DOCS are
+# installed (if not set, DATA and DOCS files are installed directly
+# into the contrib/ directory)
+#   DATA -- random files to install into $PREFIX/share/$MODULEDIR
+#   DATA_built -- random files to install into $PREFIX/share/$MODULEDIR,
 # which need to be built first
 #   DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data
-#   DOCS -- random files to install under $PREFIX/doc/contrib
+#   DOCS -- random files to install under $PREFIX/doc/$MODULEDIR
 #   SCRIPTS -- script files (not binaries) to install into $PREFIX/bin
 #   SCRIPTS_built -- script files (not binaries) to install into $PREFIX/bin,
 # which need to be built first
@@ -86,12 +89,19 @@ include $(top_srcdir)/src/Makefile.shlib
 all: all-lib
 endif # MODULE_big
 
+ifndef MODULEDIR
+datamoduledir = contrib
+docmoduledir = contrib
+else
+datamoduledir = $(MODULEDIR)
+docmoduledir = $(MODULEDIR)
+endif
 
 install: all installdirs
 ifneq (,$(DATA)$(DATA_built))
 	@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
 	done
 endif # DATA
 ifneq (,$(DATA_TSEARCH))
@@ -109,8 +119,8 @@ endif # MODULES
 ifdef DOCS
 ifdef docdir
 	@for file in $(addprefix $(srcdir)/, $(DOCS)); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \
 	done
 endif # docdir
 endif # DOCS
@@ -137,7 +147,7 @@ endif # MODULE_big
 
 installdirs:
 ifneq (,$(DATA)$(DATA_built))
-	$(MKDIR_P) '$(DESTDIR)$(datadir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(datadir)/$(datamoduledir)'
 endif
 ifneq (,$(DATA_TSEARCH))
 	$(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data'
@@ -147,7 +157,7 @@ ifneq (,$(MODULES))
 endif
 ifdef DOCS
 ifdef docdir
-	$(MKDIR_P) '$(DESTDIR)$(docdir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(docdir)/$(docmoduledir)'
 endif # docdir
 endif # DOCS
 ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built))
@@ -161,7 +171,7 @@ endif # MODULE_big
 
 uninstall:
 ifneq (,$(DATA)$(DATA_built))
-	rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built)))
+	rm -f $(addprefix '$(DESTDIR)$(datadir)'/$(datamoduledir)/, $(notdir $(DATA) $(DATA_built)))
 endif
 ifneq (,$(DATA_TSEARCH))
 	rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH)))
@@ -170,7 +180,7 @@ ifdef MODULES
 	rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES)))
 endif
 ifdef DOCS
-	rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS))
+	rm -f $(addprefix '$(DESTDIR)$(docdir)'/$(docmoduledir)/, $(DOCS))
 endif
 ifdef PROGRAM
 	rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)'

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS

2010-01-02 Thread Mark Cave-Ayland


Tom Lane wrote:


If you can set it up in such a way that the default behavior doesn't
change, this would be workable.  I don't think we want people to
suddenly find their stuff installing in the wrong place.

It probably wouldn't be that hard, something along the lines of
ifndef MODULEDIR
MODULEDIR=contrib
endif
ought to do it no?


Yeah, that was pretty much along the lines of what I was thinking. 
Please find the revised v2 patch attached for comment. The one thing I 
have done is separated out the moduledir variable into datamoduledir and 
docmoduledir so there is a little bit of wiggle room if someone needs to 
install DATA items and DOCS items in different locations.


I did have a brief look at seeing whether it would be possible to use 
this instead of DATA_TSEARCH, however this won't work because the DATA 
and DATA_TSEARCH targets need their files installed in two separate 
locations.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index a83dad3..feb7fc2 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -19,8 +19,11 @@
 #
 #   MODULES -- list of shared objects to be build from source file with
 # same stem (do not include suffix in this list)
-#   DATA -- random files to install into $PREFIX/share/contrib
-#   DATA_built -- random files to install into $PREFIX/share/contrib,
+#   MODULEDIR -- subdirectory under contrib into which DATA and DOCS are
+# installed (if not set, DATA and DOCS files are installed directly
+# into the contrib/ directory)
+#   DATA -- random files to install into $PREFIX/share/$MODULEDIR
+#   DATA_built -- random files to install into $PREFIX/share/$MODULEDIR,
 # which need to be built first
 #   DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data
 #   DOCS -- random files to install under $PREFIX/doc/contrib
@@ -86,12 +89,20 @@ include $(top_srcdir)/src/Makefile.shlib
 all: all-lib
 endif # MODULE_big
 
+ifndef MODULEDIR
+datamoduledir = contrib
+docmoduledir = contrib
+else
+datamoduledir = $(MODULEDIR)
+docmoduledir = $(MODULEDIR)
+endif
+
 
 install: all installdirs
 ifneq (,$(DATA)$(DATA_built))
 	@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'; \
 	done
 endif # DATA
 ifneq (,$(DATA_TSEARCH))
@@ -109,8 +120,8 @@ endif # MODULES
 ifdef DOCS
 ifdef docdir
 	@for file in $(addprefix $(srcdir)/, $(DOCS)); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/$(docmoduledir)'; \
 	done
 endif # docdir
 endif # DOCS
@@ -137,7 +148,7 @@ endif # MODULE_big
 
 installdirs:
 ifneq (,$(DATA)$(DATA_built))
-	$(MKDIR_P) '$(DESTDIR)$(datadir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(datadir)/$(datamoduledir)'
 endif
 ifneq (,$(DATA_TSEARCH))
 	$(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data'
@@ -147,7 +158,7 @@ ifneq (,$(MODULES))
 endif
 ifdef DOCS
 ifdef docdir
-	$(MKDIR_P) '$(DESTDIR)$(docdir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(docdir)/$(docmoduledir)'
 endif # docdir
 endif # DOCS
 ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built))
@@ -161,7 +172,7 @@ endif # MODULE_big
 
 uninstall:
 ifneq (,$(DATA)$(DATA_built))
-	rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built)))
+	rm -f $(addprefix '$(DESTDIR)$(datadir)'/$(datamoduledir)/, $(notdir $(DATA) $(DATA_built)))
 endif
 ifneq (,$(DATA_TSEARCH))
 	rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH)))
@@ -170,7 +181,7 @@ ifdef MODULES
 	rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES)))
 endif
 ifdef DOCS
-	rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS))
+	rm -f $(addprefix '$(DESTDIR)$(docdir)'/$(docmoduledir)/, $(DOCS))
 endif
 ifdef PROGRAM
 	rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)'

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add subdirectory support for DATA/DOCS with PGXS

2009-12-30 Thread Mark Cave-Ayland


Alvaro Herrera wrote:


The attached patch is a prototype which allows the user to specify a
new MODULEDIR variable in a module makefile which, if specified,
will install DATA and DOCS items in contrib/$(MODULEDIR) rather than
just contrib. If MODULEDIR is left unspecified, the files will
simply be stored directly in contrib/ as before.


As a proof of its usefulness, you could remove DATA_TSEARCH and replace
it with usage of MODULEDIR, right?


Not in its current form because PGXS always places files underneath a 
contrib/ subdirectory within datadir. However, if people are happier 
with this approach then it shouldn't be too hard to alter things so that 
my PGXS Makefile would look like this:



MODULE_big=postgis-1.5
MODULEDIR=contrib/$(MODULE_big)


Once in this form it should then be possible to use this code to replace 
the DATA_TSEARCH variable that is currently in place.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add subdirectory support for DATA/DOCS with PGXS

2009-12-29 Thread Mark Cave-Ayland


Hi all,

Since moving PostGIS over to PGXS for the 1.4 release, we've been 
looking at how we can support multiple versions of PostGIS being 
installed in different databases within the same cluster.


We are able to version the .so file produced by PGXS without too much 
difficulty, however PGXS in its current form does not have the ability 
to install different versions of the DATA or DOCS files within the same 
installation.


The attached patch is a prototype which allows the user to specify a new 
MODULEDIR variable in a module makefile which, if specified, will 
install DATA and DOCS items in contrib/$(MODULEDIR) rather than just 
contrib. If MODULEDIR is left unspecified, the files will simply be 
stored directly in contrib/ as before.


In my current development setup here, the head of the new PostGIS PGXS 
Makefile now looks like this:



MODULE_big=postgis-1.5
MODULEDIR=$(MODULE_big)
...
...


With this patch in place, make install on the PGXS will correctly 
install the DATA and DOCS files in versioned directories and therefore 
allow multiple installations within the same database cluster.


The coding within the Makefile isn't too difficult in its current form, 
but I'd be interested to get some initial feedback as to whether the 
introduction of a new MODULEDIR variable is the best way to add this new 
piece of functionality.



Many thanks,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index a83dad3..4e17e8a 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -19,8 +19,11 @@
 #
 #   MODULES -- list of shared objects to be build from source file with
 # same stem (do not include suffix in this list)
-#   DATA -- random files to install into $PREFIX/share/contrib
-#   DATA_built -- random files to install into $PREFIX/share/contrib,
+#   MODULEDIR -- subdirectory under contrib into which DATA and DOCS are
+# installed (if not set, DATA and DOCS files are installed directly
+# under the contrib/ directory)
+#   DATA -- random files to install into $PREFIX/share/contrib/$MODULEDIR
+#   DATA_built -- random files to install into $PREFIX/share/contrib/$MODULEDIR,
 # which need to be built first
 #   DATA_TSEARCH -- random files to install into $PREFIX/share/tsearch_data
 #   DOCS -- random files to install under $PREFIX/doc/contrib
@@ -86,12 +89,16 @@ include $(top_srcdir)/src/Makefile.shlib
 all: all-lib
 endif # MODULE_big
 
+ifdef MODULEDIR
+moduledir = /$(MODULEDIR)
+endif
+
 
 install: all installdirs
 ifneq (,$(DATA)$(DATA_built))
 	@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib$(moduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/contrib$(moduledir)'; \
 	done
 endif # DATA
 ifneq (,$(DATA_TSEARCH))
@@ -109,8 +116,8 @@ endif # MODULES
 ifdef DOCS
 ifdef docdir
 	@for file in $(addprefix $(srcdir)/, $(DOCS)); do \
-	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
-	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib'; \
+	  echo $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib$(moduledir)'; \
+	  $(INSTALL_DATA) $$file '$(DESTDIR)$(docdir)/contrib$(moduledir)'; \
 	done
 endif # docdir
 endif # DOCS
@@ -137,7 +144,7 @@ endif # MODULE_big
 
 installdirs:
 ifneq (,$(DATA)$(DATA_built))
-	$(MKDIR_P) '$(DESTDIR)$(datadir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(datadir)/contrib$(moduledir)'
 endif
 ifneq (,$(DATA_TSEARCH))
 	$(MKDIR_P) '$(DESTDIR)$(datadir)/tsearch_data'
@@ -147,7 +154,7 @@ ifneq (,$(MODULES))
 endif
 ifdef DOCS
 ifdef docdir
-	$(MKDIR_P) '$(DESTDIR)$(docdir)/contrib'
+	$(MKDIR_P) '$(DESTDIR)$(docdir)/contrib$(moduledir)'
 endif # docdir
 endif # DOCS
 ifneq (,$(PROGRAM)$(SCRIPTS)$(SCRIPTS_built))
@@ -161,7 +168,7 @@ endif # MODULE_big
 
 uninstall:
 ifneq (,$(DATA)$(DATA_built))
-	rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib/, $(notdir $(DATA) $(DATA_built)))
+	rm -f $(addprefix '$(DESTDIR)$(datadir)'/contrib$(moduledir)/, $(notdir $(DATA) $(DATA_built)))
 endif
 ifneq (,$(DATA_TSEARCH))
 	rm -f $(addprefix '$(DESTDIR)$(datadir)'/tsearch_data/, $(notdir $(DATA_TSEARCH)))
@@ -170,7 +177,7 @@ ifdef MODULES
 	rm -f $(addprefix '$(DESTDIR)$(pkglibdir)'/, $(addsuffix $(DLSUFFIX), $(MODULES)))
 endif
 ifdef DOCS
-	rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib/, $(DOCS))
+	rm -f $(addprefix '$(DESTDIR)$(docdir)'/contrib$(moduledir)/, $(DOCS))
 endif
 ifdef PROGRAM
 	rm -f '$(DESTDIR)$(bindir)/$(PROGRAM)$(X)'

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS

2009-08-22 Thread Mark Cave-Ayland


Tom Lane wrote:


Huh.  As far as I can see this example should traverse the same code
path.  I was about to ask for the dataset, but I think you might have
already sent it to me once --- does this look familiar?

$ tar tvfj geography.tar.bz2
-rw-r--r-- shade/shade 6444737 2008-06-06 13:33 geography.dbf
-rw-r--r-- shade/shade 37179008 2008-06-06 13:33 geography.shp
-rw-r--r-- shade/shade   263140 2008-06-06 13:33 geography.shx

If so, what do I do with it exactly --- the file extensions convey
nothing to my mind at the moment ...


Okay. I've gone back and had a look at the original queries, and it 
seems the reason for the inflated times is that my setup for the 
original database was based on PostGIS 1.3, which has a RECHECK applied 
to the  operator.


If I temporarily remove the RECHECK from PostGIS 1.3 then the times drop 
substantially:



postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select the_geom from geography where id=69495);


QUERY PLAN


-
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=394.183..394.184 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4432) 
(actual time=18.192..41.855 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=46.711..345.940 rows=29687

loops=1)
 Index Cond: (centroid  $0)
 Filter: ((type)::text = 'Z'::text)
 Total runtime: 394.265 ms
(8 rows)


Incidentally, the recently-released PostGIS 1.4 has RECHECK disabled by 
default, and so it can be seen that the times are reasonably similar:



postgis14=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select the_geom from geography where id=69495);


QUERY PLAN


-
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=396.314..396.315 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4439) 
(actual time=14.198..37.340 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=42.169..344.337 rows=29687

loops=1)
 Index Cond: (centroid  $0)
 Filter: ((type)::text = 'Z'::text)
 Total runtime: 396.375 ms
(8 rows)


If I re-apply your patch to PostgreSQL 8.4 using PostGIS 1.4 (ignoring 
RECHECK) then the results now look like this:



postgis14=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select the_geom from geography where id=69495);


QUERY PLAN


-
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=271.360..271.362 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4439) 
(actual time=17.534..32.009 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=32.393..165.057 rows=29687

loops=1)
 Index Cond: (centroid  $0)
 Filter: ((type)::text = 'Z'::text)
 Total runtime: 271.428 ms
(8 rows)

postgis14=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select force_2d(the_geom) from geography where 
id=69495);


QUERY PLAN


-
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=272.091..272.092 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4439) 
(actual time=18.644..42.680 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=43.079..172.788 rows=29687

loops=1)
 Index Cond: (centroid  $0)
 Filter: ((type)::text = 'Z'::text)
 Total runtime: 272.185 ms
(8 rows)


So in conclusion, I think that patch looks good and that the extra time 
I was seeing was due to RECHECK being applied to the  operator, and 
not the time being spent within the index scan itself.



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list (pgsql-hackers

Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS

2009-08-18 Thread Mark Cave-Ayland


Tom Lane wrote:


There was recently another go-round on the postgis-devel list about
the same problem Mark Cave-Ayland complained about last year:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php
Basically, what is happening is a nestloop join where the inner
indexscan gets a comparison argument from the outer table, and that
argument is toasted.  Every single call of an index support function
detoasts the argument again :-(, leading to upwards of 90% of the
runtime being spent in detoasting.

I made a proposal to fix it
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00709.php
but that failed due to excessive ambition --- the cost/benefit/risk
tradeoffs just weren't good enough.

Thinking about it again, it seems to me that a much narrower patch
could solve the specific forms of the problem that the PostGIS folk
are seeing.  Instead of trying to have a general-purpose method of
preventing repeat de-toasting, we could just prevent it for inner
indexscans by having ExecIndexEvalRuntimeKeys() detoast anything it's
passing to the index AM.  The attached patch accomplishes this with
a net addition of about three statements.  (It looks bigger, because
I had to move a hunk of code to have the datatype info available when
needed.)  Paul Ramsey reports that this fixes the problem for him:
http://postgis.refractions.net/pipermail/postgis-devel/2009-August/006659.html

The only downside I can see offhand is that it will detoast short-header
values that might not actually need to be detoasted.  But we don't have
any very good way to know whether a datatype's index support functions
use PG_DETOAST_DATUM or PG_DETOAST_DATUM_PACKED.  In the former case we
do need to detoast short-header values.  The extra overhead in this case
amounts to only one palloc and a fairly short memcpy, which should be
pretty negligible in comparison to the other setup costs of an
indexscan, so I'm not very concerned about it.

Comments?  Better ideas?

regards, tom lane



Hi Tom,

Thanks for the patch. Fortunately enough I was able to find the dataset 
from the original report above, and so I've tested your patch against 
PostgreSQL 8.4. Unfortunately in the original test case, it doesn't seem 
to give the same performance improvement for me that Paul was seeing :(



postgis13=# \d geography
   Table public.geography
   Column   |  Type  |Modifiers
++-
 gid| integer| not null default 
nextval('geography_gid_seq'::regclass)

 type   | character varying(1)   |
 id | numeric(20,0)  |
 name   | character varying(32)  |
 population | numeric(20,0)  |
 abbreviati | character varying(2)   |
 po_name| character varying(100) |
 id_geo_sta | numeric(20,0)  |
 the_geom   | geometry   |
 centroid   | geometry   |
Indexes:
geography_pkey PRIMARY KEY, btree (gid)
idx_geography_centroid_z gist (centroid)
Check constraints:
enforce_dims_the_geom CHECK (ndims(the_geom) = 2)
enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 
'MULTIPOLYGON'::text OR the_geom IS NULL)

enforce_srid_the_geom CHECK (srid(the_geom) = (-1))


PostgreSQL 8.4 vanilla:

postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select the_geom from geography where id=69495);


QUERY PLAN
--
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=1770.333..1770.334 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4432) 
(actual time=19.529..43.893 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=48.897..1730.004 
rows=29687 loops=1)

 Index Cond: (centroid  $0)
 Filter: ((type)::text = 'Z'::text)
 Total runtime: 1770.417 ms
(8 rows)


PostgreSQL 8.4 with your patch applied:

postgis13=# explain analyze select count(*) from geography where 
type='Z' and centroid  (select the_geom from geography where id=69495);


QUERY PLAN
--
 Aggregate  (cost=6892.28..6892.29 rows=1 width=0) (actual 
time=1499.414..1499.416 rows=1 loops=1)

   InitPlan 1 (returns $0)
 -  Seq Scan on geography  (cost=0.00..6884.00 rows=1 width=4432) 
(actual time=18.901..42.648 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using idx_geography_centroid_z on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=43.133..1456.944 
rows=29687 loops

Re: [HACKERS] Proper entry of polygon type data

2009-03-25 Thread Mark Cave-Ayland


Peter Willis wrote:


I am aware of PostGIS and already use it. My question was regarding
the entry format of PostgreSQL polygon data. There is a void
in the PostgreSQL documentation regarding this.

Incidentally, PostGIS uses PostgreSQL polygon, point, and path
data types.


Errr... no it doesn't. PostGIS uses its own internal types to represent 
all the different geometries, although it does provide a cast between 
the existing PostgreSQL types as an aid for people wishing to migrate.



Using PostGIS for simple , non-geographic, polygon rules is a
bit like using a tank to kill a mosquito.


Interesting metaphor... ;)


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] mingw check hung

2009-01-28 Thread Mark Cave-Ayland


Magnus Hagander wrote:


Per discussion I looked at just reverting that part, but that won't
work. If we do that, the call to SetEnvironmentVariable() will not be
run, which certainly isn't right..

The problem has to be in win32env.c. I originally thought we
accidentally called the putenv function twice in this case, but that
code seems properly #ifdef:ed to MSVC.

I'm not sure I trust the crash point at all - is this compiled with
debug info enabled? It seems like a *very* strange line to crash on...

I can't spot the error right off :-( Can you try to see if it's the
putenv() or the unsetenv() that gets broken? (by making sure just one of
them get replaced)

//Magnus


Hi guys,

Don't know if this is relevant at all, but it reminds me of a problem I 
had with environment variables in PostGIS with MingW. It was something 
along the lines of environment variables set in a MingW program using 
putenv() for PGPORT, PGHOST etc. weren't visible to a MSVC-compiled 
libpq but were to a MingW-compiled libpq. It's fairly easy to knock up a 
quick test program in C to verify this.


I eventually gave up and just built a connection string instead - for 
reference the final patch is here 
http://postgis.refractions.net/pipermail/postgis-commits/2008-January/000199.html. 
I appreciate it may not be 100% relevant, but I thought I'd flag it up 
as possibly being a fault with the MingW putenv implementation.



HTH,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [postgis-devel] CLUSTER in 8.3 on GIST indexes break

2008-12-05 Thread Mark Cave-Ayland


Hi Kevin,

Yeah I see exactly the same problem on 8.3.5 too, although it seems 
random - what seems to happen is that sometimes the contents of the 
temporary table disappears. I've attached a test script which causes the 
error *some* of the time, although it tends to occur more often just 
after the server has been restarted.


I've been invoking the attached script against a PostgreSQL 
8.3.5/PostGIS 1.3.4 installation, and when the bug hits I see the 
following psql output against a freshly restarted server:



postgis13=# \i /tmp/postgis-strange.sql
SELECT
CREATE INDEX
ANALYZE
 count
---
 1
(1 row)

CLUSTER
ANALYZE
 count
---
 1
(1 row)

postgis13=# \i /tmp/postgis-strange.sql
psql:/tmp/postgis-strange.sql:2: ERROR:  relation tmp already exists
psql:/tmp/postgis-strange.sql:3: ERROR:  relation tmp_geom_idx already 
exists

ANALYZE
 count
---
 1
(1 row)

CLUSTER
ANALYZE
 count
---
 0
(1 row)

postgis13=# \i /tmp/postgis-strange.sql
psql:/tmp/postgis-strange.sql:2: ERROR:  relation tmp already exists
psql:/tmp/postgis-strange.sql:3: ERROR:  relation tmp_geom_idx already 
exists

ANALYZE
 count
---
 0
(1 row)

CLUSTER
ANALYZE
 count
---
 0
(1 row)


So in other words, the contents of the temporary table has just 
disappeared :(



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
-- Count script
create temp table tmp as select st_makepoint(random(), random()) as the_geom 
from generate_series(1, 1);
create index tmp_geom_idx on tmp using gist (the_geom);
analyze tmp;
select count(*) from tmp;
cluster tmp using tmp_geom_idx;
analyze tmp;
select count(*) from tmp;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [postgis-devel] CLUSTER in 8.3 on GIST indexes break

2008-12-05 Thread Mark Cave-Ayland


Gregory Stark wrote:


Uhm. That rather sucks. I was able to reproduce it too.

It seems to happen after I pause for a bit, and not when I run the script in
fast succession.


Thanks for the verification Greg. I'm wondering if the GiST part is a 
red herring, and in fact it is related to some bizarre interaction 
between CLUSTER/VACUUM/autovacuum?



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Patch status for reducing de-TOAST overhead?

2008-10-20 Thread Mark Cave-Ayland

Hi everyone,

I'm just following up on the patch created here by Tom to aid with
repeated de-TOASTing attempts:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg01096.php

Given the performance report from Jeff
(http://archives.postgresql.org/pgsql-hackers/2008-08/msg01178.php), is
there still scope for getting something like this in 8.4 - or this patch
still far shy of anything ready for a commit fest? I have a feeling that
working on this is something currently beyond my own capability :(


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Missing results from scroll cursor in PostgreSQL 8.3.3?

2008-09-25 Thread Mark Cave-Ayland


Hi there,

Following up a report on the PostGIS bugtracker (also submitted to 
pgsql-bugs here: 
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00086.php), I'm 
wondering if there is a bug in the way that GiST indexes interact with 
scroll cursors.


I've managed to reproduce the bug using btree_gist rather than PostGIS 
and have attached a test case for review. The key point is that if a 
GiST index is used to pull results from a scroll cursor then FETCH 
ABSOLUTE X fails to return any rows. I'm wondering if it could be 
related to the fact the GiST indexes are not ordered? Perhaps the only 
thing that is wrong is that a suitable ERROR message is missing?



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
--
-- Initial data
--

CREATE TABLE ctest (
id int8,
name varchar
);

INSERT INTO ctest (id, name) SELECT id, 'Test' || id FROM generate_series(1, 
1000) AS id;

CREATE INDEX ctest_id_idx ON ctest(id);


--
-- Return absolute cursor records using sequential scan  index
--

BEGIN;

SET enable_seqscan = 't';

DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990;
FETCH ABSOLUTE -1 IN CUR;
FETCH ABSOLUTE 3 IN CUR;
CLOSE CUR;

SET enable_seqscan = 'f';

DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990;
FETCH ABSOLUTE -1 IN CUR;
FETCH ABSOLUTE 3 IN CUR;
CLOSE CUR;

COMMIT;


--
-- Rebuild with btree_gist
--

DROP INDEX ctest_id_idx;
CREATE INDEX ctest_id_gist_idx ON ctest USING gist(id gist_int8_ops);


--
-- Now try again... but this time no results are returned using GiST index scan?
--

BEGIN;

SET enable_seqscan = 't';

DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990::bigint;
FETCH ABSOLUTE -1 IN CUR;
FETCH ABSOLUTE 3 IN CUR;
CLOSE CUR;

SET enable_seqscan = 'f';

DECLARE CUR SCROLL CURSOR FOR SELECT * FROM ctest WHERE id = 990::bigint;
FETCH ABSOLUTE -1 IN CUR;
FETCH ABSOLUTE 3 IN CUR;
CLOSE CUR;

COMMIT;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing

2008-07-03 Thread Mark Cave-Ayland


Tom Lane wrote:

 OK, I've reproduced the test case locally.  I believe that when you
 say worse, you mean worse than 8.3, right?  And you did tell me
 offlist that you were testing with --enable-cassert.  CVS HEAD has
 very substantially greater cassert overhead because of the
 randomize_memory addition --- oprofile output for this test looks like

 samples  %image name   symbol name
 1239580  78.7721  postgres randomize_mem
 1435449.1218  libc-2.7.so  memcpy
 48039 3.0528  libc-2.7.so  memset
 13838 0.8794  postgres LWLockAcquire
 12176 0.7738  postgres index_getnext
 11697 0.7433  postgres LWLockRelease
 10406 0.6613  postgres hash_search_with_hash_value
 4739  0.3012  postgres toast_fetch_datum
 4099  0.2605  postgres _bt_checkkeys
 3905  0.2482  postgres AllocSetAlloc
 3751  0.2384  postgres PinBuffer
 3545  0.2253  postgres UnpinBuffer

 I'm inclined to think that we'd better turn that off by default,
 since it's not looking like it's catching anything new.

Yes, I suspect that's probably it. I applied the patch straight to CVS 
tip as I wasn't aware of any changes that would affect the unpatched 
result, but I was obviously wrong ;)


(cut)

 On the whole I'm still feeling pretty discouraged about this patch ...

At the very least we have some more information on how an eventual 
solution should work, and a test case to help analyse the effectiveness 
of any potential solution.



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing

2008-07-03 Thread Mark Cave-Ayland


Gregory Stark wrote:

 Well at least it caught the bug that Mark was performance testing with a
 --enable-cassert build :/

True ;)  I appreciated that there would be some overhead, but I didn't 
think it would be that much. This was mainly since I seem to remember 
there was talk a while back of enabling some assertions in production 
builds.



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing

2008-07-01 Thread Mark Cave-Ayland


Tom Lane wrote:

Attached is a worked-out patch for the approach proposed here:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
namely, that cache management for de-TOASTed datums is handled
by TupleTableSlots.

To avoid premature detoasting of values that we might never need, the
patch introduces a concept of an indirect TOAST pointer, which has
the same 0x80 or 0x01 header as an external TOAST pointer, but can
be told apart by having a different length byte.  Within that we have
* pointer to original toasted field within the Slot's tuple
* pointer to the owning Slot
* pointer to decompressed copy, or NULL if not decompressed yet
Some fairly straightforward extensions to the TupleTableSlot code,
heaptuple.c, and tuptoaster.c make it all go.

My original thoughts had included turning FREE_IF_COPY() into a no-op,
but on investigation that seems impractical.  One case that still
depends on that pfree is where we have palloc'd a 4-byte-header copy
of a short-header datum to support code that needs properly aligned
datum content.  The solution adopted in the patch is to arrange for
pfree() applied to a cacheable detoasted object to be a no-op, whereas
it still works normally for non-cached detoasted objects.  We do this
by inserting a dummy chunk header that points to a dummy memory context
whose pfree support method does nothing.  I think this part of the patch
would be required for any toast caching method, not just this one.

What I like about this patch is that it's a fairly small-footprint
change, it doesn't add much overhead, and it covers caching of
decompression for in-line-compressed datums as well as the out-of-line
case.

One thing I really *don't* like about it is that it requires everyplace
that copies Datums to know about indirect pointers: in general, the copy
must be a copy of the original toasted Datum, not of the indirect
pointer, else we have indirect pointers that can outlive their owning
TupleTableSlot (or at least outlive its current tuple cycle).  There
only seem to be about half a dozen such places in the current code,
but still it seems a rather fragile coding rule.

After playing with it for a little bit, I'm not convinced that it buys
enough performance win to be worth applying --- the restriction of cache
lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
(For example, sorts that involve toasted sort keys continue to suck,
because the tuples being sorted aren't in Slots.)  It would probably
fix the specific case that the PostGIS hackers were complaining of,
but I think we need something more.

Still, I wanted to get it into the archives because the idea of indirect
toast pointers might be useful for something else.


Hi Tom,

Thanks very much for supplying the WIP patch. In the interest of testing 
and feedback, I've just downloaded PostgreSQL CVS head and applied your 
patch, compiled up a slightly modified version of PostGIS (without 
RECHECKs on the GiST opclass) and loaded in the same dataset.


Unfortunately I have to report back that with your WIP patch applied, 
timings seem to have become several orders of magnitude *worse*:



[EMAIL PROTECTED]:~$ psql -d postgis
psql (8.4devel)
Type help for help.

postgis=# explain analyze select count(*) from geography where centroid 
 (select the_geom from geography where id=69495);


QUERY PLAN
--
 Aggregate  (cost=7100.28..7100.29 rows=1 width=0) (actual 
time=18238.932..18238.934 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7092.00 rows=1 width=4387) 
(actual time=27.472..69.223 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_centroid_idx on geography 
(cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041 
rows=32880 loops=1)

 Index Cond: (centroid  $0)
 Total runtime: 18239.918 ms
(7 rows)


In fact, even sequential scans seem to have gone up by several orders of 
magnitude:



postgis=# set enable_indexscan = 'f';
SET
postgis=# set enable_bitmapscan = 'f';
SET
postgis=# explain analyze select count(*) from geography where centroid 
 (select the_geom from geography where id=69495);

 QUERY PLAN

 Aggregate  (cost=14184.01..14184.01 rows=1 width=0) (actual 
time=9117.022..9117.024 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7092.00 rows=1 width=4387) 
(actual time=23.780..54.362 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Seq Scan on geography  (cost=0.00..7092.00 rows=1 width=0) 
(actual time=55.016..9073.084 rows=32880 loops=1)

 Filter: (centroid  $0)
 Total runtime: 9117.174 ms
(7 rows)


ATB,

Mark.

--
Mark Cave

Re: [HACKERS] WIP patch: reducing overhead for repeat de-TOASTing

2008-07-01 Thread Mark Cave-Ayland


Guillaume Smet wrote:


From the discussion we had a few months ago, I don't think the no
RECHECK trick works with CVS tip anymore.

See my post on the Remove lossy-operator RECHECK flag? thread:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php
and follow-ups.

That said, perhaps it's not the only problem you have but I thought it
was worth mentioning.


Yeah sorry, that might not have been as clear as I hoped. What I meant 
was that I removed the explicit RECHECK clause from the GiST operator 
class definition - since as you correctly mention, CVS tip throws an 
error if this is present.



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-10 Thread Mark Cave-Ayland


Tom Lane wrote:


Well, yeah, because the first thing it does is pg_detoast_datum.
Just as a cross-check, try changing it to copy the value without
forcibly detoasting --- I'll bet it's still slow then.


Yeah, that appears to be exactly the case. After some grepping through 
various header files I came up with this alternative:



PG_FUNCTION_INFO_V1(LWGEOM_mcatest);
Datum LWGEOM_mcatest(PG_FUNCTION_ARGS)
{
struct varlena *pgl = PG_GETARG_RAW_VARLENA_P(0);
void *mem;

// Copy somewhere else
mem = palloc(VARSIZE_EXTERNAL(pgl));
memcpy(mem, pgl, VARSIZE_EXTERNAL(pgl));

PG_RETURN_POINTER(mem);
}


...and of course, this now takes the slower runtime of 2.5s.


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-10 Thread Mark Cave-Ayland


Tom Lane wrote:


So you are saying it is de-toasted 32880 times, in this case? If not,
where are the repeated de-toastings happening?


Inside the index support functions.  I'm thinking we could fix this by
forcibly detoasting values passed as index scan keys, but it's not quite
clear where's the best place to do that.


Ouch. This is rapidly getting out of my sphere of knowledge, but I'd 
guess you'd want to do this either just before you start the index scan, 
or cache the results within the AM after the first deTOASTing.


In terms of PostGIS, we tend to do a lot of index queries against large 
geometries so we see cases like this frequently - so optimising them 
would be good.


I did think of another idea though: at the moment all members of the 
GiST opclass for geometry objects are declared using the geometry type 
(which contains the entire geometry), whereas individual entries are 
stored within the index as box2d objects representing just their 
bounding box.


Would it make sense to rework the GiST routines so that instead of 
accepting geometry op geometry, they accept box2d op box2d? Then 
surely if we add a CAST from geometry to box2d then the geometry would 
get converted to its bounding box (which would not require deTOASTing) 
before being used as an index scan key.



ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Hi there,

I'm currently looking at a bug report in PostGIS where we are getting 
extremely long index scan times querying an index in one case, but the 
same scan can take much less time if the input geometry is calculated as 
the result of another function.


First of all, I include the EXPLAIN ANALYZE of the basic query which 
looks like this:



postgis=# explain analyze select count(*) from geography where centroid 
 (select the_geom from geography where id=69495);


QUERY PLAN
-
 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=2691.783..2691.784 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=60.987..61.373 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722 
rows=32880 loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 2692.288 ms
(8 rows)


The only real thing to know about the query is that the id field within 
the geography table is a primary key, and hence only a single geometry 
is being returned from within the subselect. Note that most of the time 
is disappearing into the index scan.


Where things start getting really strange is when we add an extra 
function called force_2d() into the mix. All this function does is scan 
through the single geometry returned from the subselect and remove any 
3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this 
query looks like this:



postgis=# explain analyze select count(*) from geography where centroid 
 (select force_2d(the_geom) from geography where id=69495);


QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=343.004..343.005 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=48.714..49.016 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880 
loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 343.084 ms
(8 rows)


So by adding in an extra function around the subselect result, we have 
speeded up the index lookup by several orders of magnitude, and the 
speedup appears to be coming from somewhere within the index scan?! I've 
spent a little bit of time playing with this and it seems even writing a 
function to return a copy of the input function is enough. Here is my 
test function below:



Datum LWGEOM_mcatest(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(LWGEOM_mcatest);
Datum LWGEOM_mcatest(PG_FUNCTION_ARGS)
{
PG_LWGEOM *pgl = (PG_LWGEOM *)
 PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
void *mem;

/* Copy somewhere else */
mem = palloc(VARSIZE(pgl));
memcpy(mem, pgl, VARSIZE(pgl)-VARHDRSZ);

PG_RETURN_POINTER(mem);
}


CREATE OR REPLACE FUNCTION mcatest(geometry)
RETURNS geometry
AS '$libdir/lwpostgis','LWGEOM_mcatest'
LANGUAGE 'C';


And then here is the resulting EXPLAIN ANALYZE:

postgis=# explain analyze select count(*) from geography where centroid 
 (select mcatest(the_geom) from geography where id=69495);


QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=283.126..283.127 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=48.712..49.040 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=49.321..215.524 rows=32880 
loops=1)

 Index Cond: (centroid  $0)
 Filter: (centroid  $0)
 Total runtime: 283.221 ms
(8 rows)


Unfortunately I can't seem to work out why the extra time is 
disappearing into the index scan when my extra mcatest() function is not 
present, especially as sprof doesn't seem to want to run at the moment 
:(  I'm wondering if it's related to either excess TOAST/palloc/pfree 
somewhere in the code, but I'd definitely appreciate any pointers.


All these tests were done using PostgreSQL 8.3.1 and the latest PostGIS SVN.


Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Simon Riggs wrote:


Hmmm, perhaps implicit casting?

Try this to see if it works better also

select count(*) from geography where centroid 
 (select the_geom::geometry from geography where id=69495);



Hi Simon,

Unfortunately that seems to take the slow runtime path too. I did 
initially think about casting being involved (since the underlying index 
storage type is actually box2d rather than geometry), however my 
mcatest() function is also declared as returning geometry too.


Interesting enough, forcing a cast to box2d instead of geometry seems to 
take the faster path, i.e:



postgis=# explain analyze select count(*) from geography where centroid
 (select the_geom::box2d from geography where id=69495);

QUERY PLAN

 Aggregate  (cost=7157.29..7157.30 rows=1 width=0) (actual 
time=376.033..376.034 rows=1 loops=1)

   InitPlan
 -  Seq Scan on geography  (cost=0.00..7149.00 rows=1 width=4559) 
(actual time=42.853..43.051 rows=1 loops=1)

   Filter: (id = 69495::numeric)
   -  Index Scan using geography_geom_centroid_idx on geography 
(cost=0.00..8.28 rows=1 width=0) (actual time=43.218..286.535 rows=32880 
loops=1)

 Index Cond: (centroid  ($0)::geometry)
 Filter: (centroid  ($0)::geometry)
 Total runtime: 376.117 ms
(8 rows)


ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange issue with GiST index scan taking far too long

2008-06-09 Thread Mark Cave-Ayland


Tom Lane wrote:


Is the value you are fetching from the geography table large enough to
be toasted?  I'm thinking you might be looking at the cost of repeated
de-toastings.


Yeah, it's a fairly large geometry field so it will definitely be 
getting toasted. So is it a case of with the mcatest function in place, 
we're effectively caching the de-TOASTED value?



BTW, that mcatest function is buggy --- it's not copying the last
4 bytes of the source value.  I don't know enough about PostGIS
data types to know what effect that would have.


It would probably lose that last point in the coordinate sequence, but 
nothing major. I've removed the -VARHDRSZ part just to check and it 
doesn't make any difference.



Many thanks,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rewriting Free Space Map

2008-03-17 Thread Mark Cave-Ayland


On Mon, 2008-03-17 at 13:23 -0400, Tom Lane wrote:

 I'm not wedded to forks, that's just the name that was used in the
 only previous example I've seen.  Classic Mac had a resource fork
 and a data fork within each file.

 Don't think I like maps though, as (a) that prejudges what the
 alternate forks might be used for, and (b) the name fails to be
 inclusive of the data fork.  Other suggestions anyone?

I believe that in the world of NTFS the concept is called streams:
http://support.microsoft.com/kb/105763.


HTH,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 8.4 development plan

2008-02-08 Thread Mark Cave-Ayland

On Friday 08 February 2008 00:52:04 Gregory Stark wrote:

 Well not really. Our current model is that only stuff that's ready for
 widespread use goes into CVS. That means everything isn't open and shared
 at all. everything is mostly sitting on people's local hard drives where
 you can't use do anything with it, let alone contribute.

 The patches mailing list is basically our poor-man's distributed SCM today.
 It's awful since a) you never know if you're looking at the most recent
 version b) updating your tree from an old version to a newer version is
 painful c) people only release versions when they think they have something
 to say or a question; they don't know you want to try out their stuff until
 you complain about last month's silly bugs d) you never know what version
 of the tree the patch was against and of course e) if you make any changes
 they have all the same problems dealing with your changes to their patch.

 And it's hardly any more centralized than a distributed SCM system would
 be.

One of the things I would like to see in the project is more modularisation 
during development . In particular, it may be useful to allow different 
maintainers to look after different parts of the backend, much in the way 
that different linux kernel developers are in charge of different subsystems.

This strikes me as being advantageous in a couple of ways:

i) It lowers the bar for entry into the project. Knowing the ins and outs of 
one subsystem is going to take a developer much less time than it does to 
learn about the entire backend.

ii) Some of the larger patches we have seen during 8.3 would be broken into 
smaller chunks based upon the part of the backend they touch; so reviewing 3 
or 4 smaller incremental patches across 3 or 4 people will take much less 
time than having to wait for someone like Alvaro or Tom to review and commit 
several hundred KB of code.

This seems to fit more with the idea of a distributed SCM, although it 
wouldn't be entirely out of the question to set up permissions using CVS/SVN.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Minor bug in src/port/rint.c

2008-01-20 Thread Mark Cave-Ayland

Hi everyone,

I believe that there is a small bug in src/port/rint.c when the input
parameter has a fractional part of 0.5 which is demonstrated by the
attached program. It appears that the PG version of rint() rounds in the
wrong direction with respect to glibc.

[EMAIL PROTECTED]:~$ ./test
rint(-1.5): -2.00
pg_rint(-1.5): -1.00
rint(1.5): 2.00
pg_rint(1.5): 1.00

The fix is, of course, to add an equals into the if() comparisons on
lines 21 and 26, so that when the fractional part is 0.5, it rounds in
the opposite direction instead.

I'm sure that this will have practically zero effect on the code,
however it may be worth applying for correctness and consistency with
other platform implementations.


ATB,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk

/* 
 * rint() test program
 *
 * Compile on gcc using:
 *   gcc -o test test.c -lm
 */

#include math.h
#include stdio.h

double
pg_rint(double x)
{
	double		f,
n = 0.;

	f = modf(x, n);

	if (x  0.)
	{
		if (f  .5)
			n += 1.;
	}
	else if (x  0.)
	{
		if (f  -.5)
			n -= 1.;
	}
	return n;
}


int main()
{
printf(rint(-1.5): %f\n, rint(-1.5));
printf(pg_rint(-1.5): %f\n, pg_rint(-1.5));
printf(rint(1.5): %f\n, rint(1.5));
printf(pg_rint(1.5): %f\n, pg_rint(1.5));
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Minor bug in src/port/rint.c

2008-01-20 Thread Mark Cave-Ayland

On Sun, 2008-01-20 at 16:47 -0500, Tom Lane wrote:

 Your proposed fix wouldn't make it act the same as glibc, only move the
 differences around.  I believe glibc's default behavior for the
 ambiguous cases is round to nearest even number.  You propose
 replacing round towards zero, which is what our code currently does,
 with round away from zero, which really isn't likely to match any
 platform's behavior.  (The behaviors specified by IEEE are to nearest
 even, towards zero, towards minus infinity, and towards plus
 infinity, with the first being the typical default.)
 
 Considering that probably every modern platform has rint(), I doubt
 it's worth spending time on our stopgap version to try to make it
 fully IEEE-compliant ...


Hi Tom,

Right, I think I understand more about this now. My confusion stemmed
from reading the GNU documentation here:
http://www.gnu.org/software/libc/manual/html_node/Rounding-Functions.html where 
in rint() section it says The default rounding mode is to round to the nearest 
integer. However, Wikipedia proved to be quite a fruitful resource, and agrees 
with what you stated which was round to even number.

Interestingly, the article on rounding also points to this page
http://support.microsoft.com/kb/196652 which has some example code to
perform round to even number, or Banker's Rounding. The reason I was
looking into this is because PostGIS will require this for an MSVC Win32
build. I haven't yet tried the implementations given in the above page,
so I'm not sure whether they are any good or not...

The big question is, of course, how much difference does this make? Does
the current implementation exhibit different behaviour for certain date
calculations between Win32 and glibc platforms, and if so does it
matter? I guess at the end of the day, if it doesn't have any real
effect then there is no need to worry about it.


ATB,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?

2007-12-09 Thread Mark Cave-Ayland

On Sat, 2007-12-08 at 17:09 -0500, Tom Lane wrote:

 So what I think we must do is split the function into two:
 
 PQconnectionNeedsPassword: true if server demanded a password and there
 was none to send (hence, can only be true for a failed connection)
 
 PQconnectionUsedPassword: true if server demanded a password and it
 was supplied from the connection function's argument list, *not*
 from environment or a password file.
 
 Offhand it looks like only dblink needs the second version ---
 all of our client-side code wants the first one.
 
 Barring objections I'll go code this up ...

Yup, that looks good to me. My only criticism would be that the name
PQconnectionUsedPassword could be a little misleading, in that based
upon the name I would expect it to return true if a password was
specified regardless of whether the method used was interactive, .pgpass
or PGPASSWORD.

The only other suggestion I can think of at the moment would be
PQconnectionUsedConnectionPassword which seems to better sum up its
purpose to me...


Kind regards,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?

2007-12-08 Thread Mark Cave-Ayland

On Sat, 2007-12-08 at 10:09 -0500, Tom Lane wrote:

  My vote goes to (3), if the work can be done quickly, or (1) if it
  can't.

Ditto.

 I don't think it's a big problem to do, as long as we are agreed on
 the behavior we want.  In particular, consider:
 
 1. If libpq obtains a password internally (ie, from PGPASSWORD or
 ~/.pgpass), and it's wrong, do we want a user password prompt?
 
 2. If we prompt the user for a password, and it's wrong, do we
 want to try again?
 
 Our historical behavior on both points was no, but at least
 the second one seems a bit questionable.

I'd say that you definitely don't want a user password prompt if libpq's
password is wrong, since I can see this would break a lot of scripts
that weren't launched directly from the shell - the switch from
non-interactive to interactive would cause the script to hang in the
background rather than return immediately with an error (indeed it was
this confusing symptom in the PostGIS installer that flagged the change
in behaviour).

As for the second point, I'm not too worried about how many times you
are asked for the password - I personally am happy with the one attempt
as it stands at the moment. My main concern is the switch from a
non-interactive mode to an interactive mode.


Kind regards,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?

2007-12-07 Thread Mark Cave-Ayland

Hi everyone,

I think that I may have found a minor bug with PostgreSQL 8.3beta4 with
respect to md5 authentication. I actually discovered this on Win32, but
it appears that the behaviour is the same under Linux too.

As part of the PostGIS install under Win32, I have a few scripts that
check for the existence of a particular database by doing the following:

psql -d adatabase -h localhost -c SELECT version();

By checking the psql exit code, it is fairly easy to see whether this
failed, and if so display the contents of stdout for the user. The
problem I have is that under PostgreSQL 8.3beta4, if the database
doesn't exist then I get an extra password prompt which breaks the
install scripts as they run in the background :(

To recreate this is fairly easy:

1. Temporarily rename any .pgpass files so they aren't found by libpq
2. Stop the PostgreSQL 8.3 server
3. Change pg_hba.conf so that local connections are disabled, but
connections to 127.0.0.1 are allowed with md5 authentication
4. Restart the PostgreSQL server
5. Open up a shell and do the following:

[EMAIL PROTECTED]:~$ export PGPASSWORD=mypass
[EMAIL PROTECTED]:~$ psql -h localhost -d postgres -c SELECT version();
  version 
---
 PostgreSQL 8.3beta2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.0.3 (Ubuntu 4.0.3-1ubuntu5)
(1 row)


So far so good. But now try with a database that doesn't exist:

[EMAIL PROTECTED]:~$ psql -h localhost -d doesntexist -c SELECT
version();
Password:
psql: FATAL:  database doesntexist does not exist


H. So even though PGPASSWORD is set (and the command works if the
database exists within the cluster), if I specify a non-existent
database then I still get prompted for a password.

I've run the same test against PostgreSQL 8.2.5 and the test works in
that I don't get prompted for a password the second time. So the
behaviour has changed between versions, but I wanted to check that it
wasn't a deliberate change before looking deeper.


Many thanks,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Possible PostgreSQL 8.3beta4 bug with MD5 authentication in psql?

2007-12-07 Thread Mark Cave-Ayland

On Fri, 2007-12-07 at 11:03 -0500, Tom Lane wrote:

 Hmmm ... it seems the problem is that we've defined
 PQconnectionUsedPassword in such a way that it returns true (causing a
 prompt) regardless of whether the reason for the connection failure was
 a bad password or not.  We might need to reconsider that API.

Right. I think it depends on the interpretation of the
PQconnectionUsedPassword function. If it should simply return whether or
not the connection used a password or not (as it does now), then you
could argue that it should be psql which should incorporate an
additional check to determine whether the attempt was cancelled due to
an invalid database name.

On first glance, libpq appears to return just CONNECTION_BAD and an
error message, so I'm not sure whether we can easily detect the
difference between an incorrect password and an invalid database name :(
Is there an additional status code in PGconn that can be used to
determine the exact cause of connection failure?


ATB,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Locating sharedir in PostgreSQL on Windows

2007-11-26 Thread Mark Cave-Ayland

Hi everyone,

I'm working on a piece of code for PostGIS to allow the loading of
projection configuration files from the share/postgresql directory, but
I can't find a way of getting this to work under Win32.

AIUI the way to do this would be to use a combination of my_exec_path
and get_share_path in order to find the directory, but MingW refuses to
find the my_exec_path variable during linking. Unfortunately I suspect
that this because my_exec_path is not declared as DLLIMPORT in
backend/utils/init/globals.c :(

I really need to find a solution to this that can work all the way back
to PostgreSQL 8.0 - can anyone think of any better ideas?


Many thanks,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Locating sharedir in PostgreSQL on Windows

2007-11-26 Thread Mark Cave-Ayland

On Mon, 2007-11-26 at 11:55 -0500, Tom Lane wrote:

 I'm not sure why Mark's having a problem accessing my_exec_path ---
 it *is* declared DLLIMPORT in miscadmin.h (which is where it counts,
 AIUI) clear back to 8.0.

Bah, I think that is the source of the problem. Having grepped the
source for my_exec_path, I found the reference in globals.c and hence
the code worked on Linux (my main development environment) with a simple
extern declaration which was what was confusing me. Adding the #include
miscadmin.h solves this problem and I can now access the variable on
Windows aswell, so I put this down to user error - thanks for the
advice Tom.


Many thanks,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Locating sharedir in PostgreSQL on Windows

2007-11-26 Thread Mark Cave-Ayland

On Mon, 2007-11-26 at 17:02 -0500, Tom Lane wrote:

 I believe that that is talking specifically about shared libraries (or
 DLLs in Windows-speak), and not about configuration or data files.
 In particular, nothing under libdir would be a candidate to go under
 sharedir, nor vice versa, since the former is supposed to hold
 architecture-dependent files and the latter architecture-independent
 files.
 
 Mark hasn't been very clear about whether he wants to store static data
 files or installation-changeable configuration info, so it's not clear
 to me whether Peter's objection to using sharedir is appropriate or not.
 But unless the files are architecture-sensitive (which they might be!),
 libdir doesn't sound right in either case.

Okay, I'll try and expand on this a bit. In order to convert coordinates
between different coordinate systems, PostGIS uses the external PROJ.4
library. Now in order to support a certain category of conversion,
PROJ.4 requires access to a set of library grid reference files which
are effectively compiled from source files into a set of data files as
part of the build process. The path to this directory of files is then
built into the DLL at compile time, although it can be overriden with an
API call.

Under Linux, this is fairly easy as the files are normally installed
somewhere under /usr/share/proj, and hence the directory exists at both
compile-time and run-time. Windows is trickier because drive letters and
mappings can change - the default of C:\PROJ\NAD may or may not exist,
or can change depending upon the current drive configuration. I can also
see issues arising if the PostgreSQL installation is moved from the C:\
drive to another.

Hence my idea was to create a directory under $sharedir such as
$sharedir/postgresql/contrib/postgis/nad and install the files there.
Then regardless of the location of the PostgreSQL installation or the
current drive setup, I can use get_share_path() with the PROJ.4 API to
set the new library path the first time the function is called, and
everything will just work.

I can see Peter's argument about not putting files directly in
$sharedir, but I feel the usage suggested above falls under a similar
use case to the tsearch2 data files (which is mostly where I looked for
inspiration).

Hopefully this will help make things a bit clearer - please let me know
if any more information is needed.


Many thanks,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] msvc and vista fun

2007-06-25 Thread Mark Cave-Ayland

On Sun, 2007-06-24 at 13:23 -0400, Andrew Dunstan wrote:

(cut)

 On a somewhat related note, I have had spectacular lack of success in 
 getting either MSVC or MinGW builds to work on Vista - so much so that I 
 have currently abandoned my attempts on that platform and I resorted to 
 resuscitating an old XP box for testing. Following some advice from 
 Magnus, I added ACLs to the build root for both an admin and a non-admin 
 user (cacls buildroot /E /T /G AdminUser:C and similarly for a non-admin 
 user) . I can build as the admin user but when I come to run initdb it 
 fails, complaining that it can't find the postgres executable. If I then 
 switch to the non-admin user, it can run initdb just fine. However, that 
 user can't build, because it gets a mysterious failure from mt.exe. 
 MinGW is even worse - it says it can't run gcc because it can't run 
 cc1.exe (IIRC), so it fails at the configure stage! All of this has cost 
 me quite a few hours in the last week, with very little to show for it.


Hi Andrew,

This has come up quite recently on the MingW lists - see
http://sourceforge.net/mailarchive/forum.php?thread_name=00fc01c7a0af%
240a037c30%240200a8c0%40AMD2500forum_name=mingw-users for a discussion
of the problem and solution.


Kind regards,

Mark.

-- 
ILande - Open Source Consultancy
http://www.ilande.co.uk



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Hierarchical Queries--Status

2007-02-09 Thread Mark Cave-Ayland

On Thu, 2007-02-08 at 20:49 -0500, Bruce Momjian wrote:
 Who is working on this item?

Jonah was trying to complete this for 8.3, but I believe that he has
handed it onto Gregory Stark - I think
http://archives.postgresql.org/pgsql-hackers/2007-01/msg01586.php is the
latest update.


Kind regards,

Mark.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Function execution costs 'n all that

2007-01-16 Thread Mark Cave-Ayland

On Mon, 2007-01-15 at 15:05 -0500, Tom Lane wrote:
 Brian Hurt [EMAIL PROTECTED] writes:
  Non-developer here, but we use a lot of plpgsql functions at work.  And 
  the functions we use fall into two broad, ill-defined catagories- 
  expensive functions and cheap functions.  What I'd like as a user is 
  some way to tell the planner this function is expensive- prefer plans 
  which call this function less even if they're otherwise more expensive 
  or this function is cheap, prefer plans that are otherwise less 
  expensive even if they call this function more often.  Precise cost 
  estimates aren't that important, IMHO.
 
 Right, so a plain constant cost would be plenty for your situation.
 
 I suspect there's an 80/20 rule at work here --- the estimator-function
 side of this will take most of the effort to design/implement, but not
 get used nearly as much as the plain-constant form ... maybe we should
 just do the constant for starters and see how many people really want to
 write C-code estimators ...
 
   regards, tom lane


Hi Tom et al,

Having worked with stored procedures on large datasets for reporting, I
would say that it would be useful to have a non-constant estimator for
the number of rows, whereas a single CPU cost constant should be fine.
Where I have struggled with this has been joining onto slightly more
exotic queries when doing large scale data processing as part of a
custom report or an application upgrade.

Using PL/PGSQL I would find it useful to have access to the constants
passed into a function to be used to help provide a row count estimate
(typically useful for passing in table/column names), e.g.

SELECT * FROM my_func('my_table1') AS t1, my_table2 AS t2 WHERE t1.id =
t2.id;


CREATE FUNCTION my_func(text) AS $$
...
$$ LANGUAGE 'plpgsql' COST 1.0 ROWS my_func_row_cost;


In my cost function, I could then estimate the number of rows using
something like below, where all constants are passed into the cost
function as parameters, e.g.:


CREATE FUNCTION my_func_row_cost(text) AS $$
DECLARE
foo bigint;
BEGIN
EXECUTE INTO foo 'SELECT COUNT(*) FROM ' || quote_literal($1);
RETURN foo;
END;
$$ LANGUAGE 'plpgsql';


In the case where a parameter was not a constant but a column name, then
it would be reasonable in my mind to simply replace that parameter with
NULL when passing to the row cost function, e.g.


SELECT * FROM my_table1 WHERE my_table1.junk = (SELECT
my_func(my_table1.name));


In this case, the text parameter passed to my_func_row_cost would be
replaced by NULL to indicate that it was non-constant.

Of course, even with constants passed upon input, it still may not be
possible to calculate a number of rows that can be returned - it could
be the case that the only parameter passed to cost function has been
converted to NULL because it is a column name. Perhaps in this case it
would be useful to specify returning NULL from my_func_row_cost means I
can't return anything meaningful, so use the fallback values.


Kind regards,

Mark.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] WITH support

2007-01-03 Thread Mark Cave-Ayland

On Wed, 2007-01-03 at 09:45 +0100, Hubert FONGARNAND wrote:

 Why not looking at http://gppl.moonbone.ru/ evgen potemkin. has ever
 made a patch for WITH and CONNECT BY?
 
 I'm ready to test these features... (RECURSIVE) when they'll land in
 CVS...


Hi Hubert,

IIRC there were two issues - firstly the license for the patch was GPL
as opposed to BSD used for PostgreSQL, and secondly I believe Tom and
some of the other developers were not happy with some of the internal
changes made by the patch. Fortunately it looks as if Jonah can commit
some serious time to this, so your offer of helping to test when the
code hits CVS is gratefully received :)


Kind regards,

Mark.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] WITH support

2006-12-30 Thread Mark Cave-Ayland

On Sat, 2006-12-30 at 00:49 -0500, Jonah H. Harris wrote:
 On 12/29/06, Bruce Momjian [EMAIL PROTECTED] wrote:
  No code yet, and I don't remember who said they were working on it.
 
 I'm still waiting to hear from Mark Cave-Ayland on whether he's going
 to pick it up or whether I'll just do it.  One way or another, there
 should be some movement regarding design discussion within a week or
 two.


Hi everyone,

Sorry to come in late on this discussion - I've been playing catch-up
with emails over the festive season...

The situation regarding WITH support is that it's something which I am
picking up in my spare time (out of work hours) and so unfortunately
progress isn't as rapid as I would like. See here for my last meaningful
patch and Tom's feedback:
http://archives.postgresql.org/pgsql-patches/2006-09/msg00260.php. The
idea was to first implement just the WITH clause itself as a first
patch, and then extend to include the recursive features with a
secondary patch.

In short, if people don't mind waiting for my free cycles to come along
then I will continue to chip away at it; otherwise if it's considered an
essential for 8.3 with an April deadline then I will happily hand over
to Jonah.


Kind regards,

Mark.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] 8.2beta1 crash possibly in libpq

2006-10-09 Thread Mark Cave-Ayland

On Sun, 2006-10-08 at 17:53 +0200, Magnus Hagander wrote:
  AFAICT the backtrace and server log is indicating that the 
  crash is happening somewhere in libpq. If someone can help me 
  figure out how to load the libpq symbols into MingW's gdb 
  then I can get a better backtrace if required as I can 
  reproduce this 100% of the time. For reference, the source 
  for the application in question can be found at 
  http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c.
 
 If you figure out how to make gdb actually work on mingw, let us know -
 not many has ever managed to get it wokring, and I don't know of anybody
 who can make it work repeatedly.
 
 That said, libpq builds with Visual C++. Could you try building your
 pgsql2shp with Visual C++ as well, and then use the Visual C++ debugger
 (or windbg, really). They should give working backtraces.
 
 //Magnus


Hi Magnus,

Getting closer I think. I managed to compile a MSVC libpq but it agreed
with the MingW backtrace in that it was jumping into the middle of
nowhere :(

I think I may be getting closer though: I've just done a comparison
build with PostgreSQL 8.1 and noticed that there is an error message is
being emitted regarding PGntuples (which is where the crash is
occuring):



PG 8.1:

[EMAIL PROTECTED] ~/postgis/pg81/postgis-1.1.4/loader
$ make
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81   -c -o shpopen.o
shpopen.c
shpopen.c:176: warning: 'rcsid' defined but not used
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81   -c -o dbfopen.o
dbfopen.c
dbfopen.c:206: warning: 'rcsid' defined but not used
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81   -c -o getopt.o
getopt.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81   -c -o shp2pgsql.o
shp2pgsql.c
shp2pgsql.c: In function `utf8':
shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible
pointer type
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o
getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe 
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81
-IC:/msys/1.0/home/mca/pg81/REL-81~1.4/include -c pgsql2shp.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81   -c -o PQunescapeBytea.o
PQunescapeBytea.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=81 shpopen.o dbfopen.o
getopt.o PQunescapeBytea.o pgsql2shp.o -liconv
C:/msys/1.0/home/mca/pg81/REL-81~1.4/lib/libpq.dll -o pgsql2shp.exe 


PG 8.2:

[EMAIL PROTECTED] ~/postgis/pg82/postgis-1.1.4/loader
$ make
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82   -c -o shpopen.o
shpopen.c
shpopen.c:176: warning: 'rcsid' defined but not used
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82   -c -o dbfopen.o
dbfopen.c
dbfopen.c:206: warning: 'rcsid' defined but not used
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82   -c -o getopt.o
getopt.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82   -c -o shp2pgsql.o
shp2pgsql.c
shp2pgsql.c: In function `utf8':
shp2pgsql.c:1686: warning: passing arg 2 of `libiconv' from incompatible
pointer type
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o
getopt.o shp2pgsql.o -liconv -o shp2pgsql.exe 
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82
-IC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/include -c pgsql2shp.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82   -c -o PQunescapeBytea.o
PQunescapeBytea.c
gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o
getopt.o PQunescapeBytea.o pgsql2shp.o -liconv
C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe 
Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import)


I think the key part is this line: Info: resolving _PQntuples by
linking to __imp__PQntuples (auto-import). Could it be that the linker
cannot find a reference to PQntuples and hence is jumping into random
code? I have verified that PQntuples does exist within libpq.dll using
the Microsoft Dependency Walker though.


Kind regards,

Mark.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] 8.2beta1 crash possibly in libpq

2006-10-09 Thread Mark Cave-Ayland

Hi Magnus,

I finally got to the bottom of this - it seems that the flags being
passed to MingW's linker were incorrect, but instead of erroring out it
decided to create a corrupt executable. Here is the command line that
was being used to link the pgsql2shp.exe executable, along with the
associated auto-import warning:


gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o
getopt.o PQunescapeBytea.o pgsql2shp.o -liconv
C:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib/libpq.dll -o pgsql2shp.exe
Info: resolving _PQntuples by linking to __imp__PQntuples (auto-import)


Note that libpq.dll is referenced directly with -l which I believe
should be an invalid syntax. This produces a corrupt executable that
crashes whenever PQntuples is accessed. On the other hand, a correct
executable can be realised by linking like this:


gcc -g -Wall -I.. -DUSE_ICONV -DUSE_VERSION=82 shpopen.o dbfopen.o
getopt.o PQunescapeBytea.o pgsql2shp.o -liconv
-LC:/msys/1.0/home/mca/pg82/REL-8~1.2BE/lib -lpq -o pgsql2shp.exe


Note there is no auto-import warning, and the use of -L and -l is how I
would expect. In actual fact, the incorrect link line was being produced
by an error in the configure.in script, so this won't be a scenario that
most people will experience.

The executables linked using the second method now work properly without
crashing during regression. The big mystery is that the command line
used to link the executables has been like that for several versions
now, and I have absolutely no idea why it only triggered this failure
when being linked against 8.2beta1 when it works perfectly on 8.1 and
8.0, and also why only PQntuples was affected.


Kind regards,

Mark.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] 8.2beta1 crash possibly in libpq

2006-10-08 Thread Mark Cave-Ayland

Hi everyone,

I'm in the process of generating the Windows installer for the latest
PostGIS 1.1.4 release and I'm getting a regression failure in one of
libpq applications - the application in question is generating a
segfault.

Testing so far shows that the regression tests pass without segfaulting
in the following scenarios:

PostgreSQL 8.2beta1 / PostGIS 1.1.4 / Linux
PostgreSQL 8.1 / PostGIS 1.1.4 / Win32

So it appears it is something to do with 8.2beta1 and Win32. I've
compiled the application with debugging symbols enabled and get the
following backtrace from gdb in MingW:


(gdb) set args -f /tmp/pgis_reg_4060/dumper postgis_reg loadedshp
(gdb) run
Starting program: C:\msys\1.0\home\mca\postgis\pg82\postgis-1.1.4
\regress/../loader/pgsql2shp.exe -f /tmp/pgis_reg_4060/dumper
postgis_reg loadedshp
Initializing... 
Program received signal SIGSEGV, Segmentation fault.
0x63512c1c in ?? ()
(gdb) bt
#0  0x63512c1c in ?? ()
#1  0x0040c69c in _fu8__PQntuples () at pgsql2shp.c:2502
#2  0x00408481 in main (ARGC=5, ARGV=0x3d2750) at pgsql2shp.c:243
(gdb) 


I also turned on the logging in the server and get the following in the
server log:


2006-10-08 12:01:15 LOG:  statement: BEGIN;
2006-10-08 12:01:15 LOG:  statement: CREATE TABLE loadedshp (gid
serial PRIMARY KEY);
2006-10-08 12:01:15 NOTICE:  CREATE TABLE will create implicit sequence
loadedshp_gid_seq for serial column loadedshp.gid
2006-10-08 12:01:15 NOTICE:  CREATE TABLE / PRIMARY KEY will create
implicit index loadedshp_pkey for table loadedshp
2006-10-08 12:01:15 LOG:  statement: SELECT
AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2);
2006-10-08 12:01:17 LOG:  statement: INSERT INTO loadedshp (the_geom)
VALUES ('010100F03F');
2006-10-08 12:01:18 LOG:  statement: INSERT INTO loadedshp (the_geom)
VALUES ('0101002240F0BF');
2006-10-08 12:01:18 LOG:  statement: INSERT INTO loadedshp (the_geom)
VALUES ('0101002240F0BF');
2006-10-08 12:01:18 LOG:  statement: END;
2006-10-08 12:01:21 LOG:  statement: select asewkt(the_geom) from
loadedshp;
2006-10-08 12:01:36 LOG:  statement: DROP table loadedshp
2006-10-08 12:01:39 LOG:  statement: BEGIN;
2006-10-08 12:01:39 LOG:  statement: CREATE TABLE loadedshp (gid
serial PRIMARY KEY);
2006-10-08 12:01:39 NOTICE:  CREATE TABLE will create implicit sequence
loadedshp_gid_seq for serial column loadedshp.gid
2006-10-08 12:01:39 NOTICE:  CREATE TABLE / PRIMARY KEY will create
implicit index loadedshp_pkey for table loadedshp
2006-10-08 12:01:39 LOG:  statement: SELECT
AddGeometryColumn('','loadedshp','the_geom','-1','POINT',2);
2006-10-08 12:01:41 LOG:  statement: COPY loadedshp (the_geom) FROM
stdin;
2006-10-08 12:01:41 LOG:  statement: END;
2006-10-08 12:01:43 LOG:  statement: select asewkt(the_geom) from
loadedshp;
2006-10-08 12:02:34 LOG:  statement: SELECT postgis_version()
2006-10-08 12:02:34 LOG:  statement: SELECT a.attname, a.atttypid,
a.attlen, a.atttypmod FROM pg_attribute a, pg_class c WHERE a.attrelid =
c.oid and a.attnum  0 AND a.atttypid != 0 AND c.relname = 'loadedshp'
2006-10-08 12:02:48 LOG:  could not receive data from client: No
connection could be made because the target machine actively refused
it.

2006-10-08 12:02:48 LOG:  unexpected EOF on client connection


AFAICT the backtrace and server log is indicating that the crash is
happening somewhere in libpq. If someone can help me figure out how to
load the libpq symbols into MingW's gdb then I can get a better
backtrace if required as I can reproduce this 100% of the time. For
reference, the source for the application in question can be found at
http://svn.refractions.net/postgis/tags/1.1.4/loader/pgsql2shp.c.


Many thanks,

Mark.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PATCHES] WIP: Hierarchical Queries - stage 1

2006-09-22 Thread Mark Cave-Ayland

Hi Tom,

Thanks for your initial thoughts on this.


On Wed, 2006-09-20 at 20:13 -0400, Tom Lane wrote:

(cut)

 You really can't get away with having the identical representation for
 CTEs and ordinary sub-selects in the range table.  For instance, it
 looks like your patch will think that
 
   select ... from (select ...) as x, x, ...
 
 is legal when it certainly is not.  I think you need either a new
 RTEKind or an additional flag in the RTE to show that it's a CTE rather
 than a plain subselect.  I'm not entirely sure that you even want the
 CTEs in the rangetable at all --- that still needs some thought.

For semantic reasons, I can see why you are questioning whether or not
the CTE should be contained within the rangetable - there is an implicit
hint that RTEs reflect entries within the FROM clause, but then I also
see that the rewriter adds RTEs when substituting view definitions into
queries. The comments in parsenodes.h also suggest that an RTE is a
namespace/data source reference for a named entity within the query.

The main problem I can see with keeping the CTEs outside the rangetable
is that according to the source, jointree nodes must currently have
RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of
maintaining the CTEs separately would involve something along the lines
of keeping a separate CTETable and creating some form of CTETBLREF node
that could be referenced within the jointree. While arguably it may be
semantically neater, it appears to involve quite a bit of extra work...
could you explain in more detail as to why you feel that CTEs should
remain outside the rangetable?

 This comes back to the question of whether the CTE per se should be an
 RTE at all.  Maybe only the reference to it should be an RTE.  The
 behavior when seeing a plain RangeVar in FROM would be to first search
 the side list of valid CTEs, and only on failure go looking for a real
 table.

This is effectively what the patch does, albeit not particularly
elegantly. I'll spend some time on making those changes a bit more
refined.


Kind regards,

Mark.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] -HEAD planner issue wrt hash_joins on dbt3 ?

2006-09-19 Thread Mark Cave-Ayland

On Mon, 2006-09-18 at 17:46 +0200, Matteo Beccati wrote:
 Tom Lane ha scritto:
  Matteo Beccati [EMAIL PROTECTED] writes:
  I cannot see anything bad by using something like that:
  if (histogram is large/representative enough)
  
  Well, the question is exactly what is large enough?  I feel a bit
  uncomfortable about applying the idea to a histogram with only 10
  entries (especially if we ignore two of 'em).  With 100 or more,
  it sounds all right.  What's the breakpoint?
 
 Yes, I think 100-200 could be a good breakpoint. I don't actually know 
 what is the current usage of SET STATISTICS, I usually set it to 1000 
 for columns which need more precise selectivity.
 
 The breakpoint could be set even higher (500?) so there is space to 
 increase statistics without enabling the histogram check, but I don't 
 feel very comfortable though suggesting this kind of possibly 
 undocumented side effect...


Hi everyone,

You may be interested to have a look at the statistics collector for the
geometry type within PostGIS. In order to prevent very large or very
small geometries from ruining the statistics histogram and generating
incorrect query plans, we make the assumption that the column
distribution is likely to be close to normal, and then remove any
ANALYZE-collected geometries from the set that lie outside +/- 3.25
standard deviations from the mean before creating the final histogram
(removes just under 1% of the data from each end of an assumed normal
distribution). This works well and AFAIK we've only ever had one
reported case of an incorrect query plan being generated using this
method.


Kind regards,

Mark.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Hierarchical Queries--Status

2006-09-04 Thread Mark Cave-Ayland

On Sat, 2006-08-26 at 22:46 -0400, Jonah H. Harris wrote:
 On 8/26/06, Alvaro Herrera [EMAIL PROTECTED] wrote:
  Actually I was thinking in the design rather than the code ...
 
 Doh!  We hadn't posted the design just yet.  Let me write him and see
 where he's at and we'll throw something together for the list.


[Note to Jonah: I've tried sending a similar version of this email to
you a couple of times, but I'm not sure that it's getting through, hence
the post to -hackers in the hope you may be able to pick it up there.]


Hi everyone,

I've had a chance to sit down for a day or so to see how to approach
adding hierarchical queries to PostgreSQL, so I thought I'd post my
initial thoughts on how to proceed. My aim is to refine the list below
based upon feedback from hackers until it gets to the point where I can
start work on it. Here's what I've got so far:


1) Add detection of the WITH clause to the parser.

2) Create a new type of RangeTblEntry to represent each common table
expression specified within the WITH clause where the subquery field
points to the nodes representing the common table expression.

3) Add planner support so that WITH clauses are mapped to a new type of
node that utilises two tuplestores - an output tuplestore and a working
tuplestore. The output tuple store will in effect be the contents of the
table expression while the working tuplestore holds the results of the
last iteration if recursive. Also implement some kind of WithState node
which keeps track of the recursion state.


Having spent some more time today looking at 1) and also at the SQL 2003
spec, it would seem that other databases offer the WITH clause within
subqueries and also as part of a view. I'd be interested to hear
feedback from other developers as to whether it would be possible to
achieve this level of support within PostgreSQL.


Many thanks,

Mark.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal for debugging of server-side stored procedures

2006-06-12 Thread Mark Cave-Ayland

 -Original Message-
 From: Thomas Hallgren [mailto:[EMAIL PROTECTED]
 Sent: 11 June 2006 10:07
 To: Mark Cave-Ayland
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Proposal for debugging of server-side stored procedures

Hi Tom,

(cut)

 Obviously I'm not a Perl nor Python hacker. I just find it hard to believe
 that languages
 that has matured over a decade doesn't have a remote debug option that can
 be used. Sockets,
 shared-memory, pipes, or whatever. You will be able to attach to it
 somehow given it's been
 launched with the correct flags.

 I think you're wrong in thinking that client/server based debugging is in
 any way unusual.
 Googling for perl+debug+remote tells me that client/server based Perl
 debugging is common.
 There are a number of solutions. The same is true for Python.

It's not unusual, it's just that the standard libraries included with
Perl/Python contain practically no support for remote debugging - while they
both provide interfaces for you to write you own replacement, there doesn't
seem to be a standard for remote debugging other than Xdebug.

(cut)

 In my experience you have two use-cases.

 1. You debug during development and have either have your own database
 instance to play
 around with or a shared sandbox database where the security is very low.
 2. You debug a running instance on a live server and the sysadmin is well
 aware what you're
 doing. You will be given required privileges as needed.

 I would argue that the times when security becomes an issue when debugging
 are extremely
 rare and not worth spending lots of time and effort on. It is enough to
 prevent anyone but a
 super-user (or even a sysadmin) to start a remote debug session.

True, although there are times for example in our web applications where we
need to debug as a non super-user since the web user accessing the database
obviously doesn't have super-user rights. In the case of DBGP this could be
a case of allowing only the super-user to set the remote host/port while
allowing any user to initiate a debug session. At least in the case the
administrator would be aware of the implications of enabling debugging in
this way.

(cut)

  A promising option at the moment would be to implement the DBGP protocol
 for
  Perl/Python/Tcl as suggested by Lukas, mainly because it appears
 ActiveState
  have already written the modules to do this (see
  http://aspn.activestate.com/ASPN/Downloads/Komodo/RemoteDebugging).

 There you go! Perl, PHP, Python, and Tcl all taken care of. IDE and all!

I've been having a look at the licensing and it appears that the server side
libraries are licensed under the Artistic License
(http://www.opensource.org/licenses/artistic-license.php). Looking at the
wording of the license I can't see that there would be much of a problem
including these with PostgreSQL to give out of the box DBGP support, but
with the option of allowing the user to override the defaults per language
in postgresql.conf - this would allow, say, for Andrew to run ptkdb as an
alternative while normal users could just go with the defaults. Can anyone
see any problems with this? If not, I'd be happy to approach the
ActiveState/Xdebug people to see if this is feasible.

The only remaining angle would be to implement the Xdebug protocol for
plpgsql in C which would probably be the most time-consuming aspect of the
project :)

(cut)

 I'd use the Komodo IDE and implement the ability to start the PL using a
 GUC setting per my
 original suggestion (with super-user requirement). Trivial solution,
 minimum effort, and
 very useful. KISS principle.

 It would be great if we could agree on a GUC flag (or flags) that would
 control debugging
 for all PL's. At present, all PL implementors must use their own (module
 specific) flags.

I'm not quite sure what you had in mind from your previous email - could you
elaborate more on this?

Kind regards,

Mark.

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal for debugging of server-side stored procedures

2006-06-10 Thread Mark Cave-Ayland

 -Original Message-
 From: Thomas Hallgren [mailto:[EMAIL PROTECTED]
 Sent: 09 June 2006 16:25
 To: Mark Cave-Ayland
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Proposal for debugging of server-side stored procedures

 Some thoughts from another Tom...

Hi Tom,

Thanks for the feedback :)

 All PL's are launched someway or another . It would probably be very
 simple to add the '-d'
 flag to the PL/Perl launcher and the '-m' flag to the PL/Python launcher
 and control it with
 module specific GUC settings. PL/Java already does this. The SQL to
 initialize debugging
 looks like this:

 SET pljava.vmoptions TO '-
 agentlib:jdwp=transport=dt_shmem,server=y,suspend=y';

 This tells the VM to load in-process debugging libraries and specifies the
 kind of
 connection to be made. As soon as the first java function is called, the
 process is
 suspended and waits for a client debugger to attach.

 The amount of work needed to create similar solutions for perl, python,
 tcl, etc. is
 probably limited and fairly trivial.

I think that Java is quite unusual in that the design of JPDA is inherently
client/server based to the point where they have defined the platform to
allow you interact with the JVM via a socket. Unfortunately the same can't
be said for Perl/Python - as you suggest passing the parameters into the
interpreter is a trivial exercise but because the debugging classes in both
languages are console interactive, the only thing you could do is redirect
the console output to a socket (see
http://search.cpan.org/dist/perl/lib/perl5db.pl and the RemotePort option in
the case of Perl).

What I would like to see is some form of IDE that our developers can use,
probably under Windows, that would enable them to step through and debug
stored procedures on a remote server on another network. Using simple
console redirection would involve parsing text output which strikes as being
something that would break easily with new releases. 

And then of course, there is the problem of security whereby anyone could
connect to the socket. For example, imagine a group of developers all
debugging different functions simultaneously; if one of them connected to
the wrong console socket then it could be confusing as the developer wanders
why their code never stops at a breakpoint.

 And it would force you to write your own proprietary debugger backend for
 each language.
 That's a major thing to undertake. Have you considered the maintenance
 aspect of it? Not
 something that would make the author of a PL module scream with joy. It
 might be wise to
 also consider what debugger preferences a skilled perl/python/whatever
 language developer
 might have. A home brewed debugger in the form of a PostgreSQL add-on
 might not be a natural
 choice.

Well my original thinking was that you could factor most of the code into
PostgreSQL itself; the only thing that PL developers would have to is
provide an API for things like step, set breakpoint, read variable, and
eval.

A promising option at the moment would be to implement the DBGP protocol for
Perl/Python/Tcl as suggested by Lukas, mainly because it appears ActiveState
have already written the modules to do this (see
http://aspn.activestate.com/ASPN/Downloads/Komodo/RemoteDebugging). The only
issue I can see with this is again related to security in that the debugger
would not respect the ACLs within PostgreSQL which could potentially allow a
user to break inside a function that wasn't his/her own.

Kind regards,

Mark.

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposal for debugging of server-side stored procedures

2006-06-10 Thread Mark Cave-Ayland

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED]
 Sent: 09 June 2006 17:01
 To: Mark Cave-Ayland
 Cc: 'Tom Lane'; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Proposal for debugging of server-side stored
 procedures

(cut)

 Debugging embedded perl has some significant challenges. You might find
 it interesting to see what can be done in the most famous embedded
 situation: mod_perl. See http://perl.apache.org/docs/1.0/guide/debug.html

 using ptkdb might be nice 

 cheers

 andrew

Hi Andrew,

Thanks for the link. This brings up another issue - what if someone wishes
to debug locally using their favourite tool rather than a PostgreSQL tool?
Should this be allowed even if it may circumvent PostgreSQL's user
permissions on functions within the database?

Kind regards,

Mark.

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Proposal for debugging of server-side stored procedures

2006-06-09 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]
 Sent: 29 May 2006 18:05
 To: Mark Cave-Ayland
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Proposal for debugging of server-side stored
 procedures

(cut)

 As far as the debug protocol details go, it'd be worth looking at the
 GDB host protocol to see if it's usable; even if not directly usable
 (it's probably too low-level) it would give ideas about the functions
 you need to provide.  It'd also be smart to understand how debugging is
 commonly done in Perl, Python, etc, with an eye to being able to
 piggyback on existing tools instead of inventing our own.

   regards, tom lane

Hi Tom,

Thanks for your input on this. I've been spending some time researching
debuggers over the past week and AFAICT we'd be struggling to use the native
debuggers that come with Perl/Python et al. The main reason for this is that
the debuggers with Perl/Python are actually interactive environments, for
example debugging in Perl is initiated with perl -d somefile.pl which then
becomes a special interactive interpreter session. The same is also true
with Python which is launched in a similar way, python -m pdb somefile.py.

However, both Perl and Python offer the ability to hook into the interpreter
to detect events. For example, Python offers a C API here [1] that allows
you to supply a callback when particular events occur; this would allow us
to execute a callback after the interpreter executes each line.

Perl seems a little more messy in that I can't find a documented C API to
hook into the interpreter, but it looks as if it may be possible to cook
something up with writing a new DB package [2] which uses XS call a C
callback. The other issue is that unlike Python, the debug capability must
be specified when creating the instance of interpreter rather than being
able to enable/disable debugging on the fly so it may mean that two
instances of the perl interpreter need to be maintained in memory - a
standard instance and a debugging instance.

Plpgsql isn't really a concern as we can simply code up whichever model we
eventually decide to use. The only bad news I can see is that it appears Tcl
may need to have the source patched in order to add debug hooks into the
interpreter [3].

Also, I'm wondering how to design the mechanism to be extendable in the
future beyond debugging server-side functions, such as hooking into more of
the executor mechanisms. For example, I could see a use case for allowing a
profiler to use the debug API to attach to the postmaster to receive
notifications whenever a new SQL query starts and ends; this would allow
someone to write a rather neat app that could rank the most popular queries
in terms of processing time really easily, but then maybe this could
considered treading on the toes of the existing stats process...

Thoughts?

Mark.

[1] http://docs.python.org/api/profiling.html
[2] http://search.cpan.org/~nwclark/perl-5.8.8/pod/perldebguts.pod
[3] http://www.xdobry.de/atkdebugger/index.html

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Proposal for debugging of server-side stored procedures

2006-05-29 Thread Mark Cave-Ayland

Hi everyone,
 
Having browsed the TODO list, one of the items that I would be interested
on working on is a debugger for stored procedures. Having searched on this
topic in the archives, I'm still short of some answers that would allow me
to come up with a complete proposal that I can use to start coding.
 
The most important question to answer in my mind is how should the
debugger communicate with the server? I can see 3 ways in which this could
happen:
 
 
1) Use the existing FE/BE protocol to allow the user to control
the debugging session using stored procedures/pseudo-tables, e.g.
 
SELECT pg_debug_enable('myfunction');
INSERT INTO pg_debug_breakpoints (function, line) VALUES
('myfunction', 2);
SELECT pg_debug_step('myfunction');
 
 
2) Spawn a separate debugger process listening on another port to
allow OOB communication with the server. This would involve the
design
and implementation of a debug FE/BE protocol, unless there is a
standard that already exists.
 
 
3) Extend the existing FE/BE protocol to allow transmission of an
OOB
debug channel.
 
 
My current thoughts are leaning towards 2 or 3, with the advantage of
approach 3 being that once you can connect to the database using a client,
the debugging functionality automatically becomes available without having
to worry about additional security checks on the debugger port (like
another version of pg_hba.conf just for the debugger) and firewalls
blocking debugger connections.
 
 
Thoughts?
 
Mark.


WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk
http://www.infomapper.com
http://www.swtc.co.uk

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it
or use it for any purpose nor disclose or distribute its contents to any
other person.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] WITH/WITH RECURSIVE implementation discussion

2006-05-02 Thread Mark Cave-Ayland

 -Original Message-
 From: Jonah H. Harris [mailto:[EMAIL PROTECTED]
 Sent: 01 May 2006 14:56
 To: Mark Cave-Ayland
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] WITH/WITH RECURSIVE implementation discussion

 On 5/1/06, Mark Cave-Ayland [EMAIL PROTECTED] wrote:
  The latest discussion I found related to this appears to be here:
  http://archives.postgresql.org/pgsql-hackers/2005-11/msg00564.php which
  indicates that Jonah is hoping to work on this for 8.2, but I don't see
 this
  item as being worked on in the TODO list - does that mean that
 development
  on this has halted?

 No, in between all my other work here at EnterpriseDB, I'm still
 working on it.  I hope to get it submitted to -patches for review in
 the next two weeks.

Hi Jonah,

That's great news. While I wouldn't say that I have intimate knowledge of
the PostgreSQL internals, I'd be happy to try and help you with this if time
is proving to be a problem.

Kind regards,

Mark.

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] WITH/WITH RECURSIVE implementation discussion

2006-05-01 Thread Mark Cave-Ayland

Hi folks,

Looking ahead at some of the projects I'll be working on in the future, I'm
seeing that having an implementation of WITH/WITH RECURSIVE for working with
tree structures is going to be a very useful feature and so would like to
re-start the discussions on this to see whether this could be achieved for
PG 8.2.

The latest discussion I found related to this appears to be here:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00564.php which
indicates that Jonah is hoping to work on this for 8.2, but I don't see this
item as being worked on in the TODO list - does that mean that development
on this has halted?


Kind regards,

Mark.


WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Any advice about function caching?

2005-11-08 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 07 November 2005 23:06
 To: Mark Cave-Ayland (External)
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Any advice about function caching?

(cut)

 If you want per-query state, keep it in a data structure 
 linked from the
 fcinfo-flinfo-fn_extra field (physically, store it in fn_mcxt, or 
 fcinfo-flinfo-create a subcontext of that if you wish).

 If you need to get control at query shutdown to free 
 non-palloc'd resources, RegisterExprContextCallback may help. 
  (I think such callbacks are only called during *successful* 
 query shutdown, though, so if you have external library state 
 you need to clean up anyway, you'll need some other approach 
 to keeping track of it ... maybe a permanent data structure 
 instead of a per-query one.)

 src/backend/utils/fmgr/funcapi.c and 
 src/backend/executor/functions.c might be useful examples.

Hi Tom,

Thanks for the advice about state - this is definitely pointing me towards
looking at the existing code for aggregates and SRFs. Incidentally I've
found that attaching my cleanup memory context to PortalContext with some
elogs() shows that it appears to be called correctly just before the portal
is destroyed - so whatever I finally come up with is likely to be a
combination of the two methods. I will dig further into the function  code
and see how I manage.

Many thanks,

Mark.

WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com 
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[HACKERS] Any advice about function caching?

2005-11-07 Thread Mark Cave-Ayland

Hi everyone,

I'm currently in the middle of trying to implement a cache for a PostGIS
server-side function that uses an external library with an expensive startup
cost, and was hoping to find some advice on the best way to implement it.

The function currently takes 2 parameters - a customised C geometry datatype
and a destination SRID which is the primary key to the library parameters in
the spatial_ref_sys table. Currently for each invocation of the function it
is necessary to lookup the parameters for both the source and destinations
SRIDs in spatial_ref_sys, create a handle for each SRID in the external
library, perform the calculation, and then return the result.

Obviously this process is quite expensive when calling the function on large
tables of data. I've currently implemented a basic cache for both the
spatial_ref_sys lookups and the external library functions which gives a
dramatic performance improvement, dropping my test query from over 2 minutes
to 6s(!), but the part I'm experiencing difficulty with is working out when
the start and end of a query occurs.

My current code works by using MemoryContextCreate() to create a child
context to MessageContext and using the Init()/Delete() functions to
initialise and destroy a cache in the local backend. However, this doesn't
really work particularly well when using cursors and prepared queries since
it appears what I should be doing is using a cache per portal rather than a
cache per backend. The other complication is that the external library
handles cannot be free()d directly but instead a customised free function
must be called.

So my questions are:

1) What is the best way of keeping track of whether the cache has been
initalised for a given portal? Looking in contrib/dblink.c it looks as if
the best way to go would be to create a hash table per backend based upon
the portal name in ActivePortal, attach the child context to PortalContext,
and use the Delete() function to call the customised free function and
remove the hash table entry. However, which memory context would be the best
one to use in order to store the hash table entries?

2) Is there a better way of doing this?

3) Would anyone consider a patch to the source tree to make implementing
something like this easier in future?


Many thanks,

Mark.


WebBased Ltd
17 Research Way
Plymouth
PL6 8BT

T: +44 (0)1752 797131
F: +44 (0)1752 791023

http://www.webbased.co.uk   
http://www.infomapper.com 
http://www.swtc.co.uk  

This email and any attachments are confidential to the intended recipient
and may also be privileged. If you are not the intended recipient please
delete it from your system and notify the sender. You should not copy it or
use it for any purpose nor disclose or distribute its contents to any other
person.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS]

2005-08-05 Thread Mark Cave-Ayland

 Date: Thu, 04 Aug 2005 11:12:58 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: Matteo Beccati [EMAIL PROTECTED]
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Enhanced containment selectivity function 
 Message-ID: [EMAIL PROTECTED]

 Matteo Beccati [EMAIL PROTECTED] writes:
  This also made me think: is there a reason why geometric selectivity
  functions return constant values rather than checking statistics for a 
  better result?

 Because no one's ever bothered to work on them.  You should talk to the
PostGIS guys, however, because they *have* been
 working on real statistics and real estimation functions for geometric
types.  It'd be nice to get some of that work
 back-ported into the core database.

 http://www.postgis.org/

   regards, tom lane

Hi Tom/Matteo,

As a PostGIS guy I can say that the latest geometric estimation functions
for the PostGIS 1.0 series are very good indeed - Sandro has done an
excellent job in working on a 2D histogram sample and it is surprising how
accurate the estimates are even with very low statistics targets.

Also, since Tom applied the ANALYZE patch for the 8.0 series, I've been
working on a new page for the PostgreSQL documentation explaining how the
ANALYZE process works, and gives an example using the complex type as to how
you can collect your own statistics during ANALYZE and make use of them from
your own selectivity functions. It's about three quarters complete, and
still in plain text format, but I hope to submit it as a documentation patch
for 8.1. My hope is that this will then encourage someone to add selectivity
functions to ltree and tsearch2 (hint hint) ;)

Kind regards,

Mark.

WebBased Ltd
17 Research Way
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Implementing SQL/PSM for PG 8.2 - debugger

2005-06-29 Thread Mark Cave-Ayland

Hi guys,

 I lean with you and Tom.  While running it over the same libpq protocol 
 would be helpful in some ways, it would have a lot of drawbacks and 
 would really change the function of libpq.  I think a separate debugging 
 protocol is in order.

Just putting on my network hat for a moment... Would an approach be to come
up with a generic encapsulation protocol, similar in principle to PPP, in
which we could run any protocols we wanted?

If we had something like a PGSQL Encapsulation Protocol (PGEP) used to
transfer all data between a PostgreSQL client/server, then we can use this
to tunnel libpq requests as they are at the moment through to the other
side. However, it would also mean that people could add any other protocols
as they see fit for debugging, statistics and all sorts of things that
people have yet to think of.

Obviously this would require a client/server interface change so it's not to
be taken lightly, but I thought I'd mention it since people have mentioned
the possibility of changes to the FE/BE protocol.


Kind regards,

Mark.


WebBased Ltd
17 Research Way
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk
 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Fixing r-tree semantics

2005-06-24 Thread Mark Cave-Ayland

Hi Tom,

 What needs more discussion is that it seems to me to make sense to extend
the standard 
 opclasses to handle the four Y-direction operators comparable to the
X-direction 
 operators that are already there, that is above, below, overabove,
and 
 overbelow.

As part of PostGIS (http://postgis.refractions.net), I submitted a patch a
while back to add additional Y-direction operators to the code which is a
slightly modified version of rtree_gist (and yes, the PostGIS implementation
will suffer from the same issues you've found with the existing R-tree
implementations).

The operators I went for were as follows:

A | B - true if A's bounding box overlaps or is below B's bounding
box
A | B - true if B's bounding box overlaps or is above B's bounding
box
A | B - true if A's bounding box is strictly below B's bounding
box
A | B - true if A's bounding box is strictly above B's bounding
box

Since the rtree_gist implementation and operators were 2D, my thoughts were
to use another op-class only if the indexing were upgraded to 3D. So with
this in mind, I created the following new GiST strategies:

#define RTOverBelowStrategyNumber   9
#define RTBelowStrategyNumber   10
#define RTAboveStrategyNumber   11
#define RTOverAboveStrategyNumber   12

This is basically what you are suggesting but with a | instead of a ^ in the
operator name (my original choice was to try and use } to indicate the
positive sense of the Y-axis but this was not accepted as a valid operator
character which is why I changed to |).

It would be harder for us to change these operators since they already
exist, but then again it would be useful from a maintenance point of view to
keep the strategy numbers and operators the same across both
implementations. Of course strategy numbers are just used internally so
these aren't such a big issue - it's more the choice of operators that would
be useful to agree on.


Kind regards,

Mark.


WebBased Ltd
17 Research Way
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Fixing r-tree semantics

2005-06-24 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 24 June 2005 14:27
 To: Mark Cave-Ayland (External)
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; oleg@sai.msu.su; 
 pgsql-hackers@postgresql.org; 'PostGIS Development Discussion'
 Subject: Re: Fixing r-tree semantics

(cut)

 Well, I was proposing more or less that but with ^ because of 
 the precedent of the two existing box_above/box_below 
 operator names. However, I'm quite happy to adopt your names, 
 since that's probably a more widely used precedent.  Sold, 
 unless there are objections.

 (BTW, it does look a bit odd that the | moves around in 
 your names. But I don't dislike it enough to not follow the 
 precedent.)

The thinking behind it was that the | represents the x-axis if you tilt you
head right 90 degrees. In effect, it would allow you to 'read' the operator
without having to go and look up what it does.

  It would be harder for us to change these operators since they already 
  exist, but then again it would be useful from a maintenance point of 
  view to keep the strategy numbers and operators the same across both 
  implementations.

 Agreed, I'll use your strategy number assignments too.

Alright no problems.

Many thanks,

Mark.

WebBased Ltd
17 Research Way
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Quick-and-dirty compression for WAL backup blocks

2005-06-06 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 04 June 2005 16:46
 To: Mark Cave-Ayland (External)
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: Quick-and-dirty compression for WAL backup blocks

(cut)

 I've completed a test run for this (it's essentially MySQL's 
 sql-bench done immediately after initdb).  What I get is:

 CVS tip of 6/1: ending WAL offset = 0/A364A780 = 2741282688 
 bytes written

 CVS tip of 6/2: ending WAL offset = 0/8BB091DC = 2343604700 
 bytes written

 or about a 15% savings.  This is with a checkpoint_segments 
 setting of 30. One can presume that the savings would be 
 larger at smaller checkpoint intervals and smaller at larger 
 intervals, but I didn't try more than one set of test conditions.

 I'd say that's an improvement worth having, especially 
 considering that it requires no net expenditure of CPU time.  
 But the table is certainly still open to discuss more 
 complicated approaches.

Hi Tom,

Thanks for the numbers. I've just been across to the OSDL STP website and it
appears that the automatic nightly PostgreSQL CVS builds set up by Mark have
been broken since the middle of April :( A brief look shows that compilation
of the PostgreSQL sources is failing with lots of errors and warnings. I
don't think I can justify the time at the moment to go and look at this
myself, however I thought I'd post to -hackers as a heads up in case anyone
else could pick this up.

Kind regards,

Mark.

WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Cost of XLogInsert CRC calculations

2005-06-01 Thread Mark Cave-Ayland

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 31 May 2005 17:27
 To: Greg Stark
 Cc: Mark Cave-Ayland (External); 'Manfred Koizar'; 'Bruce 
 Momjian'; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Cost of XLogInsert CRC calculations

(cut)

 The odds of an actual failure from case #2 are fortunately 
 not high, since a backup block will necessarily span across 
 at least one WAL page boundary and so we should be able to 
 detect stale data by noting that the next page's header 
 address is wrong.  (If it's not wrong, then at least the 
 first sector of the next page is up-to-date, so if there is 
 any tearing the CRC should tell us.)  Therefore I don't feel 
 any need to try to work out a back-patchable solution for #2. 
  But I do think we ought to change the WAL format going 
 forward to compute just one CRC across a WAL record and all 
 attached backup blocks.  There was talk of allowing 
 compression of backup blocks, and if we do that we could no 
 longer feel any certainty that a page crossing would occur.

I must admit I didn't realise that an XLog record consisted of a number of
backup blocks which were also separately CRCd. I've been through the source,
and while the XLog code is reasonably well commented, I couldn't find a
README in the transam/ directory that explained the thinking behind the
current implementation - is this something that was discussed on the mailing
lists way back in the mists of time?

I'm still a little nervous about dropping down to CRC32 from CRC64 and so
was just wondering what the net saving would be using one CRC64 across the
whole WAL record? For example, if an insert or an update uses 3 backup
blocks then this one change alone would immediately reduce the CPU usage to
one third of its original value? (something tells me that this is probably
not the case as I imagine you would have picked this up a while back). In my
view, having a longer CRC is like buying a holiday with insurance - you pay
the extra cost knowing that should anything happen then you have something
to fall back on. However, the hard part here is determining a reasonable
balance betweem the cost and the risk.

Kind regards,

Mark.

WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

1 2 >

1 - 100 of 137 matches

Mail list logo