Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Sat, Mar 12, 2011 at 02:46:19PM -0500, Tom Lane wrote:
>> This would actually seem more sensible if we went with something even
>> simpler than the current patch's behavior, namely that COLLATE only
>> affects the operator it is an *immediate* input of, and nothing
>> propagates upward in expressions ever.  I remain unconvinced that the
>> SQL spec is calling for propagation ...

> Well, it doesn't say in the general case, but there is under 6.29
>  Syntax rule 4b 

> 4) If  CSF is specified, then let DTCVE
> be the declared type of the  immediately
> contained in CSF. The maximum length, character set, and collation of
> the declared type DTCSF of CSF are determined as follows:

> b) The character set and collation of the  function> are those of DTCVE.

> A similar wording is for the trim function. While obviously it doesn't
> cover all user defined functions, it seem obviously that once you do
> propegation for a few builtins you may as well do it for all of them.
> For the concatination operator is has something similar, though written
> in a way only a spec committe could come up with.

> Frankly, without propegation the feature seems entirely useless.

I remain unconvinced, because there are too many corner cases.  Should
collation propagate up out of a subselect?  How about a CTE?  You're
starting to get into some pretty weird action-at-a-distance situations
if so, analogous to the function-input-arguments case that you were just
saying should NOT propagate collation.  And I still don't see anything
in the text of the spec to justify it.

My feeling is that the feature would be simple, explainable, and useful
if COLLATE only affected the immediately syntactically-containing
operator.  The rest of this stuff requires a huge amount of mechanism
whose behavior will be nothing but surprising, even though it's
inflexible as can be (cf Greg's point about not being able to select
collation at runtime).  I'm not going to say it's the worst piece of
language design that's ever come out of the SQL committee, but I'm
starting to feel like it's in the top ten.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump -X

2011-03-12 Thread Aaron W. Swenson
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 12/24/2010 02:32 AM, J. Roeleveld wrote:
> 
> Also, unless Gentoo actually strips the man-page and "--help" page (which I 
> do 
> seriously doubt), I do not see the "-X" option in the documentation.
> 
> --
> Joost
> 

Delayed response: No, we don't strip the documentation. It is simply
'unfooled' around with.

- - Aaron
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iF4EAREIAAYFAk17/EYACgkQCOhwUhu5AEl0RwD/akhtWUK2FFbM4Vbg4cySN8PI
9f0seIhVO9+pZPmEC2cA/2f3FqsFlOrzp9rjZPkmCP4i3B1g8UOZlCvmcUq2PXX3
=Z4O9
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Tom Lane
Martijn van Oosterhout  writes:
> I think I didn't explain myself well. The *state* should be implicit,
> the actual collation should be whatever the query says. What I was
> thinking of is the following:

> CREATE FUNCTION my_english_lt(text, text) RETURNS boolean AS $$
>return $1 < $2 COLLATE "en_US"
> $$;

> (not sure about the syntax but you get the idea).

> If you just propegate naively you would get:

> my_english_lt(x COLLATE "de_DE", y)   -> error, conflicting collation
> my_english_ly(x, y COLLATE "de_DE")   -> would work fine

> Hence my suggestion that on input to the function the parameters would
> be considered collation "de_DE" state IMPLICIT, so the collation in the
> function overrides, but if the COLLATE in the function is removed, the
> implicit collation takes hold.

Oh, I see.  Yeah, that should work correctly, because parsing inside the
function will be done with Param symbols that act pretty much like Vars
--- whatever collation they have is considered implicit.  It's important
here that we do inlining by splicing completed parsetrees together ---
we *don't* do some sort of insert-the-parameters-and-reparse-from-scratch
approach.  So the collation labelings made inside the function won't
change as a result of inlining.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Andrew Dunstan



On 03/12/2011 04:24 PM, Greg Stark wrote:

On Sat, Mar 12, 2011 at 8:42 PM, Robert Haas  wrote:

A preposition is something you should try not to end a sentence with.


Something to keep in mind when someone localises Postgres for Latin
which has this rule.




I assume Robert's comment was in jest, since it was in breach of the 
rule it was stating.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Martijn van Oosterhout
On Sat, Mar 12, 2011 at 02:46:19PM -0500, Tom Lane wrote:
> > Similarly, inside the function the parameters should be considered to
> > be IMPLICIT collation, to avoid strange errors depending on how its
> > called.
> 
> Not convinced by this.  If we say that that's how it works, then no
> user-defined function should react to COLLATE in its arguments at all,
> which seems pretty weird and restrictive --- especially if the COLLATE
> property is expected to propagate up through the function call so
> far as the calling expression is concerned.  It seems just bizarre to
> me to say that a function's internal operations don't react to an
> input collation spec but then its result is thought to still be affected
> by that.

I think I didn't explain myself well. The *state* should be implicit,
the actual collation should be whatever the query says. What I was
thinking of is the following:

CREATE FUNCTION my_english_lt(text, text) RETURNS boolean AS $$
   return $1 < $2 COLLATE "en_US"
$$;

(not sure about the syntax but you get the idea).

If you just propegate naively you would get:

my_english_lt(x COLLATE "de_DE", y)   -> error, conflicting collation
my_english_ly(x, y COLLATE "de_DE")   -> would work fine

Hence my suggestion that on input to the function the parameters would
be considered collation "de_DE" state IMPLICIT, so the collation in the
function overrides, but if the COLLATE in the function is removed, the
implicit collation takes hold.

> This would actually seem more sensible if we went with something even
> simpler than the current patch's behavior, namely that COLLATE only
> affects the operator it is an *immediate* input of, and nothing
> propagates upward in expressions ever.  I remain unconvinced that the
> SQL spec is calling for propagation ...

Well, it doesn't say in the general case, but there is under 6.29
 Syntax rule 4b 

4) If  CSF is specified, then let DTCVE
be the declared type of the  immediately
contained in CSF. The maximum length, character set, and collation of
the declared type DTCSF of CSF are determined as follows:

b) The character set and collation of the  are those of DTCVE.

A similar wording is for the trim function. While obviously it doesn't
cover all user defined functions, it seem obviously that once you do
propegation for a few builtins you may as well do it for all of them.
For the concatination operator is has something similar, though written
in a way only a spec committe could come up with.

Frankly, without propegation the feature seems entirely useless. Almost
all collations are going to be defined by implicit collations attached
to columns. If 

ORDER BY x

and

ORDER BY x || 'foo'

Don't use the same collation then that is a first grade violation of
the POLA.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] template0 database comment

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 8:42 PM, Robert Haas  wrote:
> A preposition is something you should try not to end a sentence with.
>

Something to keep in mind when someone localises Postgres for Latin
which has this rule.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Robert Haas
On Mar 12, 2011, at 12:01 PM, Tom Lane  wrote:
> Greg Stark  writes:
>> On Sat, Mar 12, 2011 at 1:59 PM, Bruce Momjian  wrote:
>>> OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
>>> don' know what template0 is used for either.  Is it pg_dumpall perhaps?
> 
>> template0: unmodifiable pristine empty database
>> template1: default template for new databases
> 
> Yeah, I think that the right way to approach this is to have initdb
> comment *both* of those databases.  I don't like that specific wording
> for template0 though.  Maybe
> 
> template0: unmodified copy of original template1 database
> template1: default template for new databases
> 
> The problem with Greg's wording is that it's falsifiable: it is possible
> for someone to modify template0 if they're determined to mess things up.
> So a description like "unmodifiable" is promising too much.
> 
> Shouldn't the "postgres" database get a comment too, while we're at it?
> Perhaps "default database to connect to"?

A preposition is something you should try not to end a sentence with.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Macros for time magic values

2011-03-12 Thread Peter Eisentraut
On fre, 2011-03-11 at 12:50 -0500, Bruce Momjian wrote:
> It has bothered me that many of our time routines use special magic
> constants for time values, e.g. 24, 12, 60, etc.
> 
> The attached patch changes these magic constants to macros to clarify
> the code.  I would like to apply this for 9.1 as a cleanup.

I think it's much clearer with the plain numbers.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 5:01 PM, Tom Lane  wrote:
> The problem with Greg's wording is that it's falsifiable: it is possible
> for someone to modify template0 if they're determined to mess things up.
> So a description like "unmodifiable" is promising too much.
>

Eh, it's possible for someone to make any part of the documentation
wrong if they're determined to mess things up enough. "Empty" is not
even technically correct since it has all the system tables and stuff.
But I think there's a point of diminishing returns where if we try to
come up with something that's technically 100% true it won't help a
user understand the key attributes that make template0 useful. Under
normal usage it has no user objects in it and it is hard to change
that which tries to guarantee that that fact remains true.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Peter Eisentraut
On lör, 2011-03-12 at 12:17 -0500, Tom Lane wrote:
> Does the SQL standard have anything to say on the matter, or is there
> a precedent in the behavior of TSQL or other DBMSes?

I had investigated this issue but the SQL standard doesn't say anything
about it.

The SQL inlining issue is tricky.  Other languages including PL/pgSQL
are not supported at the moment.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Peter Eisentraut
On lör, 2011-03-12 at 12:01 -0500, Tom Lane wrote:
> Shouldn't the "postgres" database get a comment too, while we're at
> it?  Perhaps "default database to connect to"?

That's not actually true, though.  Maybe it's the "default database used
by administration programs"?  In practice it might be "some otherwise
unused database that's occasionally useful". ;-)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Sat, Mar 12, 2011 at 12:17:11PM -0500, Tom Lane wrote:
>> I've thought of another area that AFAICT the current patch fails to
>> address at all: what should happen in user-defined functions?

> The POLA suggests that the collation derivation of the original query
> should not be affected by the implementation of a function. In the case
> of SQL functions this means that the expansion of the function must
> not change the results. This would mean introducing a CollateNode above
> the SQL function after expansion, though you may be able to acheive
> this by doing the collation derivation prior to expansion of the SQL
> function, but I don't know if that's feasable.

CollateExpr as presently defined wouldn't get the job done, but I think
it's not a problem because those nodes aren't actually needed at runtime
--- collation assignment for operators/functions above the inlined
function should have been done when they were parsed, so it won't change
as a result of expanding an inlined function.

> Similarly, inside the function the parameters should be considered to
> be IMPLICIT collation, to avoid strange errors depending on how its
> called.

Not convinced by this.  If we say that that's how it works, then no
user-defined function should react to COLLATE in its arguments at all,
which seems pretty weird and restrictive --- especially if the COLLATE
property is expected to propagate up through the function call so
far as the calling expression is concerned.  It seems just bizarre to
me to say that a function's internal operations don't react to an
input collation spec but then its result is thought to still be affected
by that.

This would actually seem more sensible if we went with something even
simpler than the current patch's behavior, namely that COLLATE only
affects the operator it is an *immediate* input of, and nothing
propagates upward in expressions ever.  I remain unconvinced that the
SQL spec is calling for propagation ...

> I think you need to consider the collation to be a variation of the
> type. plpgsql makes new plans for each type when dealing with any
> parameters, this should fit right in.

Yeah, the same occurred to me a little bit later --- we can actually
make that work fairly easily by treating collatable input datatypes
as if they were polymorphic.  But the question is whether we should.
You seem to be arguing above that user-defined functions ought not
pay attention to COLLATE specs on their inputs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Tom Lane
Greg Stark  writes:
> On Sat, Mar 12, 2011 at 5:17 PM, Tom Lane  wrote:
>>create function my_lt(x text, y text) returns bool as
>>$
>>begin
>>return x < y;
>>end
>>$ language plpgsql;
>> 
>>select my_lt('foo', 'bar' collate "de_DE");
>>select my_lt('foo', 'bar' collate "fr_FR");
>> 
>> I think it's at least arguably desirable that the results of the two
>> calls respond to the collation clauses, but it does not look to me
>> like that will happen: plpgsql isn't doing anything to propagate
>> its call-site collation value into expressions it evaluates, and
>> if it did, it'd still get the wrong answer on the second call because it
>> would have cached an expression plan tree containing the collation info
>> from the first call.

> I don't think it's obvious that this is the right behaviour.

I'm not sure of that either, but ...

> I think
> functions should provide the same answer on the same inputs regardless
> of context unless they're really intended to be volatile.

... that argument convinces me not at all, because they are *not* the
same inputs.  The collate clauses are different.  If I believed your
argument, then the built-in "<" function shouldn't respond to COLLATE
either.

> If you want to affect the way a plpgsql function orders things in its
> code you should pass an extra argument for collation and then the
> plpgsql function should use COLLATE colarg -- though I'm not sure if
> that works, can you put parameters in COLLATE arguments?

No, you cannot, the SQL committee has blown it on that.  COLLATE's
argument is an identifier not a variable.  There is no way to do
runtime selection of collation like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 5:17 PM, Tom Lane  wrote:
>        create function my_lt(x text, y text) returns bool as
>        $
>                begin
>                        return x < y;
>                end
>        $ language plpgsql;
>
>        select my_lt('foo', 'bar' collate "de_DE");
>        select my_lt('foo', 'bar' collate "fr_FR");
>
> I think it's at least arguably desirable that the results of the two
> calls respond to the collation clauses, but it does not look to me
> like that will happen: plpgsql isn't doing anything to propagate
> its call-site collation value into expressions it evaluates, and
> if it did, it'd still get the wrong answer on the second call because it
> would have cached an expression plan tree containing the collation info
> from the first call.
>

I don't think it's obvious that this is the right behaviour. I think
functions should provide the same answer on the same inputs regardless
of context unless they're really intended to be volatile. The default
collation specified there is not part of the value being passed. If
you want to affect the way a plpgsql function orders things in its
code you should pass an extra argument for collation and then the
plpgsql function should use COLLATE colarg -- though I'm not sure if
that works, can you put parameters in COLLATE arguments?

I do hope user defined functions return values are marked with
implicit/explicit collations based on their arguments though.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collations versus user-defined functions

2011-03-12 Thread Martijn van Oosterhout
On Sat, Mar 12, 2011 at 12:17:11PM -0500, Tom Lane wrote:
> I've thought of another area that AFAICT the current patch fails to
> address at all: what should happen in user-defined functions?



The POLA suggests that the collation derivation of the original query
should not be affected by the implementation of a function. In the case
of SQL functions this means that the expansion of the function must
not change the results. This would mean introducing a CollateNode above
the SQL function after expansion, though you may be able to acheive
this by doing the collation derivation prior to expansion of the SQL
function, but I don't know if that's feasable.

(Note the introduced collate node would need to remember the collation state.)

Similarly, inside the function the parameters should be considered to
be IMPLICIT collation, to avoid strange errors depending on how its
called.

This means you can't make a set_collation function, but that doesn't
seem like a loss to me.

>   select my_lt('foo', 'bar' collate "de_DE");
>   select my_lt('foo', 'bar' collate "fr_FR");
> 
> I think it's at least arguably desirable that the results of the two
> calls respond to the collation clauses, but it does not look to me
> like that will happen: plpgsql isn't doing anything to propagate
> its call-site collation value into expressions it evaluates, and
> if it did, it'd still get the wrong answer on the second call because it
> would have cached an expression plan tree containing the collation info
> from the first call.

I think you need to consider the collation to be a variation of the
type. plpgsql makes new plans for each type when dealing with any
parameters, this should fit right in.

SQL would need a recollate-label node like suggested above.

For other languages you just need to provide the info, what they do
with it is not your problem.

> What do we want to do about this?  Making it work the way it seems like
> it ought to will require a rather substantial investment of effort.
> It looks to me like the least invasive answer would be to have plpgsql
> cache different plan trees depending on the collation it receives for
> its parameters, but that's still a whole lot of work.  Does the SQL
> standard have anything to say on the matter, or is there a precedent in
> the behavior of TSQL or other DBMSes?

I can't help you with other DBs, google isn't finding me anything. But
the plpgsql problem should be done already right, given it already
handles cached plans for different types.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] memory-related bugs

2011-03-12 Thread Tom Lane
Noah Misch  writes:
> A suitably-instrumented run of "make installcheck-world" under valgrind turned
> up a handful of memory-related bugs:

Hmm, interesting work, but I don't think I believe in the necessity for
this kluge:

> + else if (attributeName != &(att->attname))
> + namestrcpy(&(att->attname), attributeName);

The rules against overlapping memcpy/strcpy's source and destination are
meant to cover the case of partial overlap; I find it hard to imagine an
implementation that will mess up when the source and destination are
identical.  If we did think it was important to avoid this situation I
would rather find another way, like modifying the caller.  Likewise
the other changes to avoid no-op memcpy's do not appear to me to be
bugs, though possibly they might save enough cycles to be worth doing
anyway.

> ! stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type));
> ! memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type));
> ...
> ! stats->attrtype = (Form_pg_type) palloc(TYPE_FIXED_PART_SIZE);
> ! memcpy(stats->attrtype, GETSTRUCT(typtuple), TYPE_FIXED_PART_SIZE);

I wonder whether we should instead fix this by copying the correct tuple
length.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Collations versus user-defined functions

2011-03-12 Thread Tom Lane
I've thought of another area that AFAICT the current patch fails to
address at all: what should happen in user-defined functions?
Consider

create function my_lt(x text, y text) returns bool as
$$
begin
return x < y;
end
$$ language plpgsql;

select my_lt('foo', 'bar' collate "de_DE");
select my_lt('foo', 'bar' collate "fr_FR");

I think it's at least arguably desirable that the results of the two
calls respond to the collation clauses, but it does not look to me
like that will happen: plpgsql isn't doing anything to propagate
its call-site collation value into expressions it evaluates, and
if it did, it'd still get the wrong answer on the second call because it
would have cached an expression plan tree containing the collation info
from the first call.

In SQL-language functions the situation is even worse, because they will
behave differently depending on whether or not they get inlined.
(I think ... haven't really tested that case.)

What do we want to do about this?  Making it work the way it seems like
it ought to will require a rather substantial investment of effort.
It looks to me like the least invasive answer would be to have plpgsql
cache different plan trees depending on the collation it receives for
its parameters, but that's still a whole lot of work.  Does the SQL
standard have anything to say on the matter, or is there a precedent in
the behavior of TSQL or other DBMSes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Tom Lane
Greg Stark  writes:
> On Sat, Mar 12, 2011 at 1:59 PM, Bruce Momjian  wrote:
>> OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
>> don' know what template0 is used for either.  Is it pg_dumpall perhaps?

> template0: unmodifiable pristine empty database
> template1: default template for new databases

Yeah, I think that the right way to approach this is to have initdb
comment *both* of those databases.  I don't like that specific wording
for template0 though.  Maybe

template0: unmodified copy of original template1 database
template1: default template for new databases

The problem with Greg's wording is that it's falsifiable: it is possible
for someone to modify template0 if they're determined to mess things up.
So a description like "unmodifiable" is promising too much.

Shouldn't the "postgres" database get a comment too, while we're at it?
Perhaps "default database to connect to"?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Bruce Momjian
Greg Stark wrote:
> On Sat, Mar 12, 2011 at 1:59 PM, Bruce Momjian  wrote:
> > OK, funny guys. ?;-) ?Can someone give me the right text. ?Obviously I
> > don' know what template0 is used for either. ?Is it pg_dumpall perhaps?
> >
> 
> template0: unmodifiable pristine empty database
> template1: default template for new databases

I think I like "unmodifiable empty database".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: index corruption in PG 8.3.13

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 3:06 AM, Nikhil Sontakke
 wrote:
> Live 522's      (LSN: logid 29, recoff 0xd1fade3c) previous points to
> the zeroed out 523 block. Note that this seems to be latest LSN in the
> data file.
>

So do you have logs from the server when it was restarted? It should
say how far it recovered before it started up

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] memory-related bugs

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 1:32 PM, Noah Misch  wrote:
> A suitably-instrumented run of "make installcheck-world" under valgrind turned
> up a handful of memory-related bugs:


Nice work. How did you instrument things so valgrind knew about palloc
et al? I remember trying this in the past and running into problems. I
think the biggest one was that we write out structs to disk including
padding so trigger lots of reads of uninitialized data warnings.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Greg Stark
On Sat, Mar 12, 2011 at 1:59 PM, Bruce Momjian  wrote:
> OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
> don' know what template0 is used for either.  Is it pg_dumpall perhaps?
>

template0: unmodifiable pristine empty database
template1: default template for new databases



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FuncExpr.collid/OpExpr.collid unworkably serving double duty

2011-03-12 Thread Martijn van Oosterhout
On Fri, Mar 11, 2011 at 04:56:31PM -0500, Tom Lane wrote:
> > In my implementation I needed to expand this to the general set of
> > operators postgresql supported and relaxed this to only consider
> > arguments to the function/operator that had the same type as the
> > resulting type of the function/operator, since that's the only thing
> > that makes sense.
> 
> Yeah, that corresponds to Transact-SQL's distinction between functions
> that take a string and produce a string, versus those that produce a
> string without any string inputs.  But I don't see anything justifying
> such a distinction in the text of the standard.

I remember being bugged by the fact that the standard indeed said
nothing (that I could find) about the results of functions that
accepted something other than strings.

> Also note that the TSQL doc explicitly points out that collation labels
> can be carried up through changes of character string types, so I think
> you're wrong to say that collation is only carried through functions that
> produce exactly the same type as their input.  I'd say collation labels
> propagate through any function that has both collatable input(s) and a
> collatable result type.

Yes, this is the most logical and useful interpretation. Collations for
all the string types are essentially compatable so can be treated as
equivalent for this purpose.

(My bit about same type at input is due to my idea of extending
collation to more than just strings. I don't beleive the current patch
supports that anyway. I intended to treat all string types as
equivalent.

> In any case, I am sure that that what this describes is not what our
> current code does :-(, and that we can't get anywhere close to this with
> the existing infrastructure.  So at this point I'm thinking that the
> safest approach is to rip out the result-collation caching fields and
> perform collation assignment in a parsing post-pass.  That will allow us
> to revise the collation assignment algorithm without further catversion
> bumps.

Are you sure? I've grabbed the patch and an looking at it. It looks to
me like it is parse_expr properly combining the collations of the
subnodes, just like it is doing the type determination.

Hmm, but select_common_collation looks a bit dodgy indeed. For example,
whether a collation is explicit or not depends on the type of the
subnode. That's not right at all. During the parse phase you must not
only track the collation but also the state (default, implicit,
explicit, error, none). Otherwise you indeed get the issue where you
don't throw errors at the right time.

I had a collatestate object:

+   Oid colloid;/* OID of collation */
+   CollateType colltype;   /* Type of Collation */

Where CollateType is IMPLICIT, EXPLICIT, NONE or DEFAULT.

And then I had the function below to handle the combining. This I think
handles the collation rules correctly, but I don't think it can be
shoehorned into the current patch.

Hope this helps with your review.

Have a nice day,

+ /* This logic is dictated by the SQL standard */
+ CollateState *
+ CombineCollateState( CollateState *arg1, CollateState *arg2 )
+ {
+   /* If either argument is coerable (the default), return the other */
+   if( !arg1 || arg1->colltype == COLLATE_COERCIBLE )
+   return arg2;
+   if( !arg2 || arg2->colltype == COLLATE_COERCIBLE )
+   return arg1;
+   /* If both are explicit, they must be the same */
+   if( arg1->colltype == COLLATE_EXPLICIT && arg2->colltype == 
COLLATE_EXPLICIT )
+   {
+   if( arg1->colloid != arg2->colloid )
+   ereport( ERROR, (errmsg("Conflicting COLLATE 
clauses")));
+   /* Both same, doesn't matter which we return */
+   return arg1;
+   }
+   /* Otherwise, explicit overrides anything */
+   if( arg1->colltype == COLLATE_EXPLICIT )
+   return arg1;
+   if( arg2->colltype == COLLATE_EXPLICIT )
+   return arg2;
+   /* COLLATE_NONE can only be overridden by EXPLICIT */
+   if( arg1->colltype == COLLATE_NONE )
+   return arg1;
+   if( arg2->colltype == COLLATE_NONE )
+   return arg2;
+   
+   /* We only get here if both are implicit */
+   /* Same locale, return that implicitly */
+   if( arg1->colloid == arg2->colloid )
+   return arg1;
+   
+   /* Conflicting implicit collation, return NONE */
+   {
+   CollateState *result = makeNode(CollateState);
+   result->colltype = COLLATE_NONE;
+   result->colloid = InvalidOid;
+   return result;
+   }
+ }
+ 

-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle

Re: [HACKERS] Macros for time magic values

2011-03-12 Thread Bruce Momjian

Applied.

---

Bruce Momjian wrote:
> It has bothered me that many of our time routines use special magic
> constants for time values, e.g. 24, 12, 60, etc.
> 
> The attached patch changes these magic constants to macros to clarify
> the code.  I would like to apply this for 9.1 as a cleanup.
> 
> -- 
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
> 
>   + It's impossible for everything to be true. +

[ text/x-diff is unsupported, treating like TEXT/PLAIN ]

> diff --git a/src/backend/utils/adt/date.c b/src/backend/utils/adt/date.c
> index 80dd10b..f96fa6c 100644
> --- a/src/backend/utils/adt/date.c
> +++ b/src/backend/utils/adt/date.c
> @@ -2612,7 +2612,7 @@ timetz_zone(PG_FUNCTION_ARGS)
>   type = DecodeSpecial(0, lowzone, &val);
>  
>   if (type == TZ || type == DTZ)
> - tz = val * 60;
> + tz = val * MINS_PER_HOUR;
>   else
>   {
>   tzp = pg_tzset(tzname);
> diff --git a/src/backend/utils/adt/datetime.c 
> b/src/backend/utils/adt/datetime.c
> index 85f0206..f0fe2e3 100644
> --- a/src/backend/utils/adt/datetime.c
> +++ b/src/backend/utils/adt/datetime.c
> @@ -342,7 +342,7 @@ j2date(int jd, int *year, int *month, int *day)
>   *year = y - 4800;
>   quad = julian * 2141 / 65536;
>   *day = julian - 7834 * quad / 256;
> - *month = (quad + 10) % 12 + 1;
> + *month = (quad + 10) % MONTHS_PER_YEAR + 1;
>  
>   return;
>  }/* j2date() */
> @@ -952,8 +952,8 @@ DecodeDateTime(char **field, int *ftype, int nf,
>* DecodeTime()
>*/
>   /* test for > 24:00:00 */
> - if (tm->tm_hour > 24 ||
> - (tm->tm_hour == 24 &&
> + if (tm->tm_hour > HOURS_PER_DAY ||
> + (tm->tm_hour == HOURS_PER_DAY &&
>(tm->tm_min > 0 || tm->tm_sec > 0 || 
> *fsec > 0)))
>   return DTERR_FIELD_OVERFLOW;
>   break;
> @@ -1371,12 +1371,12 @@ DecodeDateTime(char **field, int *ftype, int nf,
>   return dterr;
>  
>   /* handle AM/PM */
> - if (mer != HR24 && tm->tm_hour > 12)
> + if (mer != HR24 && tm->tm_hour > HOURS_PER_DAY / 2)
>   return DTERR_FIELD_OVERFLOW;
> - if (mer == AM && tm->tm_hour == 12)
> + if (mer == AM && tm->tm_hour == HOURS_PER_DAY / 2)
>   tm->tm_hour = 0;
> - else if (mer == PM && tm->tm_hour != 12)
> - tm->tm_hour += 12;
> + else if (mer == PM && tm->tm_hour != HOURS_PER_DAY / 2)
> + tm->tm_hour += HOURS_PER_DAY / 2;
>  
>   /* do additional checking for full date specs... */
>   if (*dtype == DTK_DATE)
> @@ -2058,17 +2058,18 @@ DecodeTimeOnly(char **field, int *ftype, int nf,
>   return dterr;
>  
>   /* handle AM/PM */
> - if (mer != HR24 && tm->tm_hour > 12)
> + if (mer != HR24 && tm->tm_hour > HOURS_PER_DAY / 2)
>   return DTERR_FIELD_OVERFLOW;
> - if (mer == AM && tm->tm_hour == 12)
> + if (mer == AM && tm->tm_hour == HOURS_PER_DAY / 2)
>   tm->tm_hour = 0;
> - else if (mer == PM && tm->tm_hour != 12)
> - tm->tm_hour += 12;
> + else if (mer == PM && tm->tm_hour != HOURS_PER_DAY / 2)
> + tm->tm_hour += HOURS_PER_DAY / 2;
>  
> - if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > 59 ||
> - tm->tm_sec < 0 || tm->tm_sec > 60 || tm->tm_hour > 24 ||
> + if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 
> ||
> + tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
> + tm->tm_hour > HOURS_PER_DAY ||
>   /* test for > 24:00:00 */
> - (tm->tm_hour == 24 &&
> + (tm->tm_hour == HOURS_PER_DAY &&
>(tm->tm_min > 0 || tm->tm_sec > 0 || *fsec > 0)) ||
>  #ifdef HAVE_INT64_TIMESTAMP
>   *fsec < INT64CONST(0) || *fsec > USECS_PER_SEC
> @@ -2396,13 +2397,15 @@ DecodeTime(char *str, int fmask, int range,
>  
>   /* do a sanity check */
>  #ifdef HAVE_INT64_TIMESTAMP
> - if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > 59 ||
> - tm->tm_sec < 0 || tm->tm_sec > 60 || *fsec < INT64CONST(0) ||
> + if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR -1 
> ||
> + tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
> + *fsec < INT64CONST(0) ||
>   *fsec > USECS_PER_SEC)
>   return DTERR_FIELD_OVERFLOW;
>  #else
> - if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > 59 ||
> - tm->tm_sec < 0 || tm->tm_sec > 60 || *fsec < 0 || *fsec > 1)
> + if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HO

Re: [HACKERS] template0 database comment

2011-03-12 Thread Bruce Momjian
Christopher Browne wrote:
> On Sat, Mar 12, 2011 at 8:59 AM, Bruce Momjian  wrote:
> > Dave Page wrote:
> >> On 3/12/11, Tom Lane  wrote:
> >> > Bruce Momjian  writes:
> >> >> People are confused about what template0 is for, so I created the
> >> >> attached one-line patch to add a database comment to template0. No
> >> >> initdb, I assume, becuase it is just a comment.
> >> >
> >> >> + ? ? ? ? ?"COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
> >> >
> >> > No objection to the concept, but the actual text of this comment is
> >> > approximately 100% wrong.
> >> >
> >>
> >> I'd like to lodge a formal objection to the use of the word
> >> 'approximately' in the above comment.
> >
> > OK, funny guys. ?;-) ?Can someone give me the right text. ?Obviously I
> > don' know what template0 is used for either. ?Is it pg_dumpall perhaps?
> 
> Whaa?!?!
> 
> pg_dump has nothing to do with it.  Only used by createdb
> 
> Possibilities include:
> - 'base template database'
> - 'base template (used if template1 is corrupted)'
> - 'backup template (use if template1 corrupted)'
> 
> Contrast with template1
> - 'default template for creation of new databases'
> 
> I dunno that those are the *best* wordings, but they may suggest one.

I thought the big deal with template0 was it was used to find items that
were added to template1 by pg_dumpall.

I think Thom's idea of not describing its use but its contents might be
best, maybe "unmodifiable template database".

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Christopher Browne
On Sat, Mar 12, 2011 at 9:14 AM, Bruce Momjian  wrote:
> I like that.  Perhaps "unmodified template database'?

"why" tends to be more important than "what", particularly to a
confused DBA who's trying to figure out "why do they have all these
extra databases???"

Perhaps...
"backup template database - normally immutable, used if template1 is corrupted"


-- 
http://linuxfinances.info/info/linuxdistributions.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Christopher Browne
On Sat, Mar 12, 2011 at 8:59 AM, Bruce Momjian  wrote:
> Dave Page wrote:
>> On 3/12/11, Tom Lane  wrote:
>> > Bruce Momjian  writes:
>> >> People are confused about what template0 is for, so I created the
>> >> attached one-line patch to add a database comment to template0. No
>> >> initdb, I assume, becuase it is just a comment.
>> >
>> >> +          "COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
>> >
>> > No objection to the concept, but the actual text of this comment is
>> > approximately 100% wrong.
>> >
>>
>> I'd like to lodge a formal objection to the use of the word
>> 'approximately' in the above comment.
>
> OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
> don' know what template0 is used for either.  Is it pg_dumpall perhaps?

Whaa?!?!

pg_dump has nothing to do with it.  Only used by createdb

Possibilities include:
- 'base template database'
- 'base template (used if template1 is corrupted)'
- 'backup template (use if template1 corrupted)'

Contrast with template1
- 'default template for creation of new databases'

I dunno that those are the *best* wordings, but they may suggest one.
-- 
http://linuxfinances.info/info/linuxdistributions.html

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Bruce Momjian
Thom Brown wrote:
> On 12 March 2011 13:59, Bruce Momjian  wrote:
> > Dave Page wrote:
> >> On 3/12/11, Tom Lane  wrote:
> >> > Bruce Momjian  writes:
> >> >> People are confused about what template0 is for, so I created the
> >> >> attached one-line patch to add a database comment to template0. No
> >> >> initdb, I assume, becuase it is just a comment.
> >> >
> >> >> + ? ? ? ? ?"COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
> >> >
> >> > No objection to the concept, but the actual text of this comment is
> >> > approximately 100% wrong.
> >> >
> >>
> >> I'd like to lodge a formal objection to the use of the word
> >> 'approximately' in the above comment.
> >
> > OK, funny guys. ?;-) ?Can someone give me the right text. ?Obviously I
> > don' know what template0 is used for either. ?Is it pg_dumpall perhaps?
> 
> 'original template database' ?

I like that.  Perhaps "unmodified template database'?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Thom Brown
On 12 March 2011 13:59, Bruce Momjian  wrote:
> Dave Page wrote:
>> On 3/12/11, Tom Lane  wrote:
>> > Bruce Momjian  writes:
>> >> People are confused about what template0 is for, so I created the
>> >> attached one-line patch to add a database comment to template0. No
>> >> initdb, I assume, becuase it is just a comment.
>> >
>> >> +          "COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
>> >
>> > No objection to the concept, but the actual text of this comment is
>> > approximately 100% wrong.
>> >
>>
>> I'd like to lodge a formal objection to the use of the word
>> 'approximately' in the above comment.
>
> OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
> don' know what template0 is used for either.  Is it pg_dumpall perhaps?

'original template database' ?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Bruce Momjian
Dave Page wrote:
> On 3/12/11, Tom Lane  wrote:
> > Bruce Momjian  writes:
> >> People are confused about what template0 is for, so I created the
> >> attached one-line patch to add a database comment to template0. No
> >> initdb, I assume, becuase it is just a comment.
> >
> >> +  "COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
> >
> > No objection to the concept, but the actual text of this comment is
> > approximately 100% wrong.
> >
> 
> I'd like to lodge a formal objection to the use of the word
> 'approximately' in the above comment.

OK, funny guys.  ;-)  Can someone give me the right text.  Obviously I
don' know what template0 is used for either.  Is it pg_dumpall perhaps?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] memory-related bugs

2011-03-12 Thread Noah Misch
A suitably-instrumented run of "make installcheck-world" under valgrind turned
up a handful of memory-related bugs:

* memcpy()/strncpy() between overlapping regions
uniqueentry() and dispell_lexize() both deduplicate an array by iteratively
copying elements downward to occlude the duplicates.  Before finding a first
duplicate, these functions call memcpy() with identical arguments.  Similarly,
resolve_polymorphic_tupdesc() calls TupleDescInitEntry() with an attributeName
pointing directly into the TupleDesc's name bytes, causing the latter to call
strncpy() with identical arguments.  The attached mem1v1-memcpy-overlap.patch
fixes these sites by checking for equal pointers before the affected call.  For
TupleDescInitEntry(), I considered instead having resolve_polymorphic_tupdesc()
pstrdup() the value.

* read past the end of a Form_pg_type in examine_attribute()
examine_attribute() copies a Form_pg_type naively.  Since the nullable columns
at the end of the structure are not present in memory, the memcpy() reads eight
bytes past the end of the source allocation.  mem2v1-analyze-overread.patch
updates this code to match how we address the same issue for Form_pg_attribute.

* off-by-one error in shift_jis_20042euc_jis_2004()
This function grabs two bytes at a time, even when only one byte remains; this
makes it read past the end of the input.  mem3v1-sjis-offbyone.patch changes it
to not do this and to report an error when the input ends in a byte that would
start a two-byte sequence.

Thanks,
nm
diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index d78b083..1ed7195 100644
*** a/src/backend/access/common/tupdesc.c
--- b/src/backend/access/common/tupdesc.c
***
*** 459,468  TupleDescInitEntry(TupleDesc desc,
 * fill in valid resname values in targetlists, particularly for resjunk
 * attributes.
 */
!   if (attributeName != NULL)
!   namestrcpy(&(att->attname), attributeName);
!   else
MemSet(NameStr(att->attname), 0, NAMEDATALEN);
  
att->attstattarget = -1;
att->attcacheoff = -1;
--- 459,468 
 * fill in valid resname values in targetlists, particularly for resjunk
 * attributes.
 */
!   if (attributeName == NULL)
MemSet(NameStr(att->attname), 0, NAMEDATALEN);
+   else if (attributeName != &(att->attname))
+   namestrcpy(&(att->attname), attributeName);
  
att->attstattarget = -1;
att->attcacheoff = -1;
diff --git a/src/backend/tsearch/dict_ispeindex 31929c0..bfd1732 100644
*** a/src/backend/tsearch/dict_ispell.c
--- b/src/backend/tsearch/dict_ispell.c
***
*** 139,145  dispell_lexize(PG_FUNCTION_ARGS)
}
else
{
!   memcpy(cptr, ptr, sizeof(TSLexeme));
cptr++;
ptr++;
}
--- 139,146 
}
else
{
!   if (ptr != cptr)
!   memcpy(cptr, ptr, sizeof(TSLexeme));
cptr++;
ptr++;
}
diff --git a/src/backend/utils/adt/tsvecindex 6810615..6c24850 100644
*** a/src/backend/utils/adt/tsvector.c
--- b/src/backend/utils/adt/tsvector.c
***
*** 125,131  uniqueentry(WordEntryIN *a, int l, char *buf, int *outbuflen)
buflen += res->poslen * sizeof(WordEntryPos) + 
sizeof(uint16);
}
res++;
!   memcpy(res, ptr, sizeof(WordEntryIN));
}
else if (ptr->entry.haspos)
{
--- 125,132 
buflen += res->poslen * sizeof(WordEntryPos) + 
sizeof(uint16);
}
res++;
!   if (ptr != res)
!   memcpy(res, ptr, sizeof(WordEntryIN));
}
else if (ptr->entry.haspos)
{
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index bafdc80..2e0a869 100644
*** a/src/backend/commands/analyze.c
--- b/src/backend/commands/analyze.c
***
*** 874,881  examine_attribute(Relation onerel, int attnum, Node 
*index_expr)
typtuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(stats->attrtypid));
if (!HeapTupleIsValid(typtuple))
elog(ERROR, "cache lookup failed for type %u", 
stats->attrtypid);
!   stats->attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type));
!   memcpy(stats->attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type));
ReleaseSysCache(typtuple);
stats->anl_context = anl_context;
stats->tupattnum = attnum;
--- 874,881 
typtuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(stats->attrtypid));
if (!

Re: [HACKERS] Fwd: index corruption in PG 8.3.13

2011-03-12 Thread Daniel Farina
On Wed, Mar 9, 2011 at 6:02 AM, Alvaro Herrera
 wrote:
> I'll send you a perl program we wrote for a customer to check for
> strange issues in btrees.  Please give it a spin; it may give you more
> clues.  If you find additional checks to add, please let me know!

I have also, coincidentally, encountered corruption of a system
catalog index -- 8.3.11 -- I have saved the file for forensics.  Is it
possible that I also receive a copy of this program?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On-the-fly index tuple deletion vs. hot_standby

2011-03-12 Thread Noah Misch
On Thu, Dec 09, 2010 at 09:48:25AM +, Simon Riggs wrote:
> On Fri, 2010-12-03 at 21:43 +0200, Heikki Linnakangas wrote:
> > On 29.11.2010 08:10, Noah Misch wrote:
> > > I have a hot_standby system and use it to bear the load of various 
> > > reporting
> > > queries that take 15-60 minutes each.  In an effort to avoid long pauses 
> > > in
> > > recovery, I set a vacuum_defer_cleanup_age constituting roughly three 
> > > hours of
> > > the master's transactions.  Even so, I kept seeing recovery pause for the
> > > duration of a long-running query.  In each case, the culprit record was an
> > > XLOG_BTREE_DELETE arising from on-the-fly deletion of an index tuple.  The
> > > attached test script demonstrates the behavior (on HEAD); the index tuple
> > > reclamation conflicts with a concurrent "SELECT pg_sleep(600)" on the 
> > > standby.
> > >
> > > Since this inserting transaction aborts, HeapTupleSatisfiesVacuum reports
> > > HEAPTUPLE_DEAD independent of vacuum_defer_cleanup_age.  We go ahead and 
> > > remove
> > > the index tuples.  On the standby, btree_xlog_delete_get_latestRemovedXid 
> > > does
> > > not regard the inserting-transaction outcome, so btree_redo proceeds to 
> > > conflict
> > > with snapshots having visibility over that transaction.  Could we 
> > > correctly
> > > improve this by teaching btree_xlog_delete_get_latestRemovedXid to ignore 
> > > tuples
> > > of aborted transactions and tuples inserted and deleted within one 
> > > transaction?
> 
> @Noah Easily the best bug reported submitted in a long time. Thanks.
> 
> > Seems reasonable. HeapTupleHeaderAdvanceLatestRemovedXid() will need 
> > similar treatment. Actually, btree_xlog_delete_get_latestRemovedXid() 
> > could just call HeapTupleHeaderAdvanceLatestRemoveXid().
> 
> Yes, it applies to other cases also. Thanks for the suggestion.
> 
> Fix committed. Please double-check my work, committed early since I'm
> about to jump on a plane.

The installation that inspired my original report recently upgraded from 9.0.1
to 9.0.3, and your fix did significantly decrease its conflict frequency.  The
last several conflicts I have captured involve XLOG_BTREE_REUSE_PAGE records.
(FWIW, the index has generally been pg_attribute_relid_attnam_index.)  I've
attached a test script demonstrating the behavior.  _bt_page_recyclable approves
any page deleted no more recently than RecentXmin, because we need only ensure
that every ongoing scan has witnessed the page as dead.  For the hot standby
case, we need to account for possibly-ongoing standby transactions.  Using
RecentGlobalXmin covers that, albeit with some pessimism: we really only need
LEAST(RecentXmin, PGPROC->xmin of walsender_1, .., PGPROC->xmin of walsender_N)
- vacuum_defer_cleanup_age.  Not sure the accounting to achieve that would pay
off, though.  Thoughts?

Thanks,
nm


repro-btree-reuse.sh
Description: Bourne shell script
*** a/src/backend/access/nbtree/nbtpage.c
--- b/src/backend/access/nbtree/nbtpage.c
***
*** 678,683  bool
--- 678,684 
  _bt_page_recyclable(Page page)
  {
BTPageOpaque opaque;
+   TransactionId xmin;
  
/*
 * It's possible to find an all-zeroes page in an index --- for 
example, a
***
*** 693,700  _bt_page_recyclable(Page page)
 * interested in it.
 */
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISDELETED(opaque) &&
!   TransactionIdPrecedesOrEquals(opaque->btpo.xact, RecentXmin))
return true;
return false;
  }
--- 694,702 
 * interested in it.
 */
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+   xmin = XLogStandbyInfoActive() ? RecentGlobalXmin : RecentXmin;
if (P_ISDELETED(opaque) &&
!   TransactionIdPrecedesOrEquals(opaque->btpo.xact, xmin))
return true;
return false;
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Dave Page
On 3/12/11, Tom Lane  wrote:
> Bruce Momjian  writes:
>> People are confused about what template0 is for, so I created the
>> attached one-line patch to add a database comment to template0. No
>> initdb, I assume, becuase it is just a comment.
>
>> +"COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
>
> No objection to the concept, but the actual text of this comment is
> approximately 100% wrong.
>

I'd like to lodge a formal objection to the use of the word
'approximately' in the above comment.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] template0 database comment

2011-03-12 Thread Tom Lane
Bruce Momjian  writes:
> People are confused about what template0 is for, so I created the
> attached one-line patch to add a database comment to template0. No
> initdb, I assume, becuase it is just a comment.
  
> + "COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",

No objection to the concept, but the actual text of this comment is
approximately 100% wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] template0 database comment

2011-03-12 Thread Bruce Momjian
People are confused about what template0 is for, so I created the
attached one-line patch to add a database comment to template0. No
initdb, I assume, becuase it is just a comment.

I plan to work on more system table and view comments for 9.2.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
new file mode 100644
index acd2514..ba9b688
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*** make_template0(void)
*** 1976,1981 
--- 1976,1983 
  		"REVOKE CREATE,TEMPORARY ON DATABASE template1 FROM public;\n",
  		"REVOKE CREATE,TEMPORARY ON DATABASE template0 FROM public;\n",
  
+ 		"COMMENT ON DATABASE template0 IS 'only used by pg_dump';\n",
+ 
  		/*
  		 * Finally vacuum to clean up dead rows in pg_database
  		 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers