Re: [HACKERS] libpq support for arrays and composites

2008-06-08 Thread Andrew Dunstan



Andrew Chernow wrote:


libpqtypes already implemented this.  It is a different approach but 
provides the same functionality; with the inclusion of being able to 
handle every data type.  libpqtypes uses the PGresult API for 
composites and arrays, rather than adding a new set of functions.





Yes, I thought you'd say that :-)

This has some significant limitations - for example (quoting from your docs)

  Arrays  are only handled using binary format.  This means that any type
  used as an array element must be put and gotten in binary format.  If a
  user-defined  type  does  not implement a send and recv function in the
  backend, it can not be used as an array element.

That makes it quite useless for my intended purpose. 


I also am not particularly enamoured of the libpqtypes way of doing things, 
which feels rather foreign to me.

Lastly, the idea is to provide extra facilities to libpq clients without 
requiring any extra library.

cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Euler Taveira de Oliveira wrote:
>> And based on your proposal, it'll be needed to add reloptions to toast  
>> tables too. IMO, we should keep that code as simple as possible.

> Sure, what's the problem with that?  We only need to make sure that
> ALTER TABLE works for setting reloptions for toast tables.

... actually, the problem is going to be "how do you get pg_dump to dump
and reload such settings"?  The toast tables are not going to have the
same names after dump/reload.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq support for arrays and composites

2008-06-08 Thread Andrew Chernow

Andrew Dunstan wrote:


One of the areas where libpq seems to be severely lacking is in handling 
arrays and composites in query results. I'd like to set about rectifying 
that.


Ideally this would mean that drivers using libpq could easily and 
reliably deliver such objects suitably structured in their particular 
languages (e.g. lists and hashes in Perl).


One complicating factor I see is that there is no protocol level support 
for anything other than simple objects - each data value is simply a 
stream of bytes of a known length. We would therefore need some pretty 
robust processing to pick apart structured objects.


We'll need a new API to handle such objects. I'm thinking of something 
like:


PQarray * PQgetArray( const PGresult *res, int row_number, int 
column_number);

int PQgetArrayNDims(PQarray * array);
int PQgetArrayLower(PQarray * array, int dim);
int PQgetArrayUpper(PQarray * array, int dim);
int PQgetArrayElementLength(PQarray * array, int dim1, ...);
bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...);
char * PQgetArrayElement(PQarray * array, int dim1, ...);

PQcomposite * PQgetComposite(const PGresult *res, int row_number, int 
column_number);

PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...);
int PQgetCompositeNFields(PQcomposite * composite);
char * PQgetCompositeFName(PQcomposite * composite, int fnumber);
int PQgetCompositeFNumber(PQcomposite * composite, char * fname);
Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber);
int PQgetCompositeFieldLength(PQcomposite * , int fnumber);
bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber);
char * PQgetCompositeField(PQcomposite * composite, int fnumber);

Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect 
not, but right now I'm just thinking out loud.


Thoughts? Is this worth doing?

cheers

andrew





libpqtypes already implemented this.  It is a different approach but provides 
the same functionality; with the inclusion of being able to handle every data 
type.  libpqtypes uses the PGresult API for composites and arrays, rather than 
adding a new set of functions.


To support this, one must be able to convert all data types (unless you are only 
supporting text results) because composites can be made up of any data type.


Simple arrays:
http://libpqtypes.esilo.com/man3/pqt-specs.html#array

Composite arrays:
http://libpqtypes.esilo.com/man3/pqt-composites.html

EXAMPLE OF GETTING A COMPOSITE:
(taken from http://libpqtypes.esilo.com/ home page)

/* Let's get a composite.
 * CREATE TYPE simple AS (a int4, t text);
 */
PGint4 i4;
PGtext text;
PGresult *res, *simple;
int resultFormat = 1;

/* Your composites need to be registered */
PQregisterTypeHandler(conn, "simple", NULL, NULL);

/* 2nd arg, PGparam, can be NULL if there are no query params.
 * Composites require binary results, so we can't use PQexec().
 */
res = PQparamExec(conn, NULL, "SELECT my_simple FROM t", resultFormat);
if(!res)
  fprintf(stderr, "ERROR: %s\n", PQgeterror());

/* Get the simple composite, which is exposed as a PGresult. */
PQgetf(res, 0, "%simple", 0, &simple);
PQclear(res); /* no longer needed */

/* Get the simple composite attributes from the simple result.
 * Reference fields by name by using a '#' rather than a '%'.
 * The field names are the composite attribute names.
 */
PQgetf(simple, 0, "#int4 #text", "a", &i4, "t", &text);
PQclear(simple);

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Alvaro Herrera
Euler Taveira de Oliveira wrote:

> FYI, I have a WIP patch to remove pg_autovacuum in favor of reloptions.  

Really?  Please send it my way to review/apply as soon as you have it
ready, independently of what we do with toast tables.

> Let's keep it simple. Why not just adding a toast_enabled flag (disabled
> by default) in pg_autovacuum? If it's set then main and toast tables are
> processed by autovac.

Actually I think your proposal is more cumbersome to use and less
flexible, because you can't set specific values for the other options
for toast tables.

> And based on your proposal, it'll be needed to add reloptions to toast  
> tables too. IMO, we should keep that code as simple as possible.

Sure, what's the problem with that?  We only need to make sure that
ALTER TABLE works for setting reloptions for toast tables.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [CORE] [HACKERS] Automating our version-stamping a bit better

2008-06-08 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Sunday, June 08, 2008 21:27:03 -0400 Tom Lane <[EMAIL PROTECTED]> wrote:

> Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> I'm tempted to suggest letting the script invoke autoconf, too,
>>> but that would require standardizing where to find the correct
>>> version of autoconf for each branch; so it might not be such a
>>> great idea. 
>
>> Unfortunately that's true. Maybe we could agree on using an alias for
>> the right version of autoconf, but it seems likely to be error prone.
>
> Actually, the way I do things is that my setup script for working
> with each particular version tree includes adjusting $PATH so that
> the right autoconf gets found just by saying "autoconf".  If everyone
> who might tag releases wanted to do it the same way, then we could
> just let the script say "autoconf".  But I'm not sure anybody else
> likes that plan.  What I was thinking was just to have the script
> print out something like
>
>   Tagged tree as 8.3.4
>   Don't forget to run autoconf 2.59 before committing

I like that one ...

- -- 
Marc G. FournierHub.Org Hosting Solutions S.A. (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkhMj4MACgkQ4QvfyHIvDvNWAACfeEuX8PCwbPgZLutpya859T+5
sDYAoKgTnLoypgDOwr4TSYVd+G5Dn+kn
=Cl6d
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Euler Taveira de Oliveira

Alvaro Herrera wrote:


We've been making noises about dealing with TOAST tables as separate
entities in autovacuum for some time now.  So here's a proposal:


Let's keep it simple. Why not just adding a toast_enabled flag (disabled
by default) in pg_autovacuum? If it's set then main and toast tables are
processed by autovac.

FYI, I have a WIP patch to remove pg_autovacuum in favor of reloptions. 
And based on your proposal, it'll be needed to add reloptions to toast 
tables too. IMO, we should keep that code as simple as possible.



--
  Euler Taveira de Oliveira
  http://www.timbira.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automating our version-stamping a bit better

2008-06-08 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I'm tempted to suggest letting the script invoke autoconf, too,
>> but that would require standardizing where to find the correct
>> version of autoconf for each branch; so it might not be such a
>> great idea.  

> Unfortunately that's true. Maybe we could agree on using an alias for 
> the right version of autoconf, but it seems likely to be error prone.

Actually, the way I do things is that my setup script for working
with each particular version tree includes adjusting $PATH so that
the right autoconf gets found just by saying "autoconf".  If everyone
who might tag releases wanted to do it the same way, then we could
just let the script say "autoconf".  But I'm not sure anybody else
likes that plan.  What I was thinking was just to have the script
print out something like

Tagged tree as 8.3.4
Don't forget to run autoconf 2.59 before committing

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automating our version-stamping a bit better

2008-06-08 Thread Andrew Dunstan



Tom Lane wrote:


I'm tempted to suggest letting the script invoke autoconf, too,
but that would require standardizing where to find the correct
version of autoconf for each branch; so it might not be such a
great idea. 

  


Unfortunately that's true. Maybe we could agree on using an alias for 
the right version of autoconf, but it seems likely to be error prone.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Automating our version-stamping a bit better

2008-06-08 Thread Tom Lane
So while tagging the upcoming releases, I got annoyed once again about
what a tedious, error-prone bit of donkeywork it is.  You've got to find
and update the sub-version numbers, and *not* any chance occurrence of
the same strings (eg s/20/21/g for version 7.4.21 would've mangled some
copyright dates).  And the changes tend to move around a little bit in
each back branch, making it even easier to blow it.  ISTM we should get
the machine to do it for us.

I propose to write a little perl script to be used like this:

cd top-level-of-tree
sh src/tools/version_stamp 22
cvs commit -m "Stamp release 7.4.22"

The script takes just one argument, which could be "devel", "betaN",
"rcN", or just a minor version number "N".  Note the assumption that the
script knows the major version.  Since we expect to adjust the script
from time to time for version changes anyway, I don't see why we
shouldn't have the major version stored right in the script.  Tagging a
new development branch after a release is split off would then look like

cd src/tools
edit version_stamp, update a variable assignment at its head
cvs commit -m "Update version_stamp for 8.5"
cd ../..
sh src/tools/version_stamp devel
cvs commit -m "Stamp CVS HEAD as 8.5devel"

Note that this is not all that helpful if we just do it in CVS HEAD.
I propose adding the script to all active branches back to 7.4, with
suitable adjustments for each branch as needed.

I think we should probably include configure.in in the set of files
that this script updates, and get rid of the current two-step
arrangement where Marc stamps configure.in/configure after somebody
else stamps everything else.  Marc's tarball-wrapping process would
thus look roughly like

sh src/tools/version_stamp 4
autoconf
cvs commit -m "Stamp release 8.3.4"
cvs tag REL8_3_4
... build tarball ...

I'm tempted to suggest letting the script invoke autoconf, too,
but that would require standardizing where to find the correct
version of autoconf for each branch; so it might not be such a
great idea. 

Thoughts, objections?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Robert Treat
On Sunday 08 June 2008 20:12:15 Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> > and i'm sure no one is against that idea, but you're never going to be
> > able to match the performance of just avoiding the check.
>
> We'll never be able to match the performance of not having transactions,
> either, but the community has never for a moment considered having a
> "no transactions" mode.
>

it's unclear what a "no transaction" mode would mean, but I'd be willing to 
guess some people have consider aspects of it (we've just never had 
agreement)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq support for arrays and composites

2008-06-08 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
One complicating factor I see is that there is no protocol level support 
for anything other than simple objects - each data value is simply a 
stream of bytes of a known length. We would therefore need some pretty 
robust processing to pick apart structured objects.



Well, it's not that hard, the quoting/escaping rules for array_out and
record_out are well defined.

Are you intending that these operations support both text and binary
results?
  


I'm a bit open on that.


The array accessors with ... parameter lists strike me as a bit
dangerous, because there is no way at all to verify that the caller is
passing the expected number of dimensions.  Can't that be made tighter?
  


Well, the only alternative I can think of is to make the client walk the 
array one dimension at a time. Something like:


PQarray * PQgetInnerArray(PQarray * array, int dim);

then when we're down to the leaf level, we could have:

int PQgetArrayElementLength(PQarray * array, int dim);
bool PQgetArrayElementIsNull(PQarray * array, int dim);
char * PQgetArrayElement(PQarray * array, int dim);

That strikes me as somewhat more cumbersome, so I guess the question is 
whether it's worth it.  It probably fits the slightly clunky feel of libpq.



Also you need to spell out the error handling conventions for each of
these.

I think you missed some "free()" operations.
  



Oh, yes, both of these are certainly true. This isn't really even a 
proposal yet, more a sketch that would lead to a proposal. I'm hoping to 
get some other input too, before settling this down, especially from 
driver writers.

It might also be useful to provide some functions that form an array or
composite value from per-element strings, ie, the converse of the
de-construction routines.  Here I'd be happy to skip the binary case.


  


Yeah, that had occurred to me. Will think about it more, although it 
could possibly be done as a separate project, too.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> The point here is that if the user disables autovac for the main table,
> then it's expected that it is automagically disabled for the toast table
> as well, for the usual case where they are disabling it because the
> table is too big.

Hmm, good point.  OK, two passes it is.  (I thought about remembering
the toast table rows in memory so as not to scan the catalog twice,
but I'm not sure you really save much that way.)

Another thing to think about here is locking: I believe you need to get
a vacuum-type lock on the parent table not only the toast table, so
vacuuming a toast table without any knowledge of which table is its
parent ain't gonna fly anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > The only change of some consideration is that we will need two passes
> > over pg_class to get the list of relations to vacuum, instead of one as
> > we do currently.  The problem is that we first need to fetch the
> > (heap relid, toast relid) mapping before attempting to figure out if any
> > given TOAST table needs vacuuming.  This is because we want to be using
> > the main table's pg_autovacuum, and we can't get at that unless we know
> > the main relid.
> 
> Umm ... is it chiseled in stone someplace that toast tables shouldn't
> have their own pg_autovacuum entries?  Seems like that might be a
> reasonable component of a "whole nine yards" approach.

No, but I think it's a bit awkward for users to follow _only_ its own
entry.  I forgot to mention that in the patch I currently have, what
autovacuum does is try to get the TOAST table's own pg_autovacuum entry,
and if that fails, get the main rel's entry.

The point here is that if the user disables autovac for the main table,
then it's expected that it is automagically disabled for the toast table
as well, for the usual case where they are disabling it because the
table is too big.  Automatically processing the toast table would be
completely unexpected, and most likely unwelcome.

Of course, for the even rarer cases when you want to disable it for the
main rel and enable it for the toast table, you can do that too.  (I
can't think of a case where this would be useful though.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Tom Lane
Robert Treat <[EMAIL PROTECTED]> writes:
> and i'm sure no one is against that idea, but you're never going to be able 
> to 
> match the performance of just avoiding the check. 

We'll never be able to match the performance of not having transactions,
either, but the community has never for a moment considered having a
"no transactions" mode.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq support for arrays and composites

2008-06-08 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> One complicating factor I see is that there is no protocol level support 
> for anything other than simple objects - each data value is simply a 
> stream of bytes of a known length. We would therefore need some pretty 
> robust processing to pick apart structured objects.

Well, it's not that hard, the quoting/escaping rules for array_out and
record_out are well defined.

Are you intending that these operations support both text and binary
results?

The array accessors with ... parameter lists strike me as a bit
dangerous, because there is no way at all to verify that the caller is
passing the expected number of dimensions.  Can't that be made tighter?
Also you need to spell out the error handling conventions for each of
these.

I think you missed some "free()" operations.

It might also be useful to provide some functions that form an array or
composite value from per-element strings, ie, the converse of the
de-construction routines.  Here I'd be happy to skip the binary case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] libpq support for arrays and composites

2008-06-08 Thread Andrew Dunstan


One of the areas where libpq seems to be severely lacking is in handling 
arrays and composites in query results. I'd like to set about rectifying 
that.


Ideally this would mean that drivers using libpq could easily and 
reliably deliver such objects suitably structured in their particular 
languages (e.g. lists and hashes in Perl).


One complicating factor I see is that there is no protocol level support 
for anything other than simple objects - each data value is simply a 
stream of bytes of a known length. We would therefore need some pretty 
robust processing to pick apart structured objects.


We'll need a new API to handle such objects. I'm thinking of something like:

PQarray * PQgetArray( const PGresult *res, int row_number, int 
column_number);

int PQgetArrayNDims(PQarray * array);
int PQgetArrayLower(PQarray * array, int dim);
int PQgetArrayUpper(PQarray * array, int dim);
int PQgetArrayElementLength(PQarray * array, int dim1, ...);
bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...);
char * PQgetArrayElement(PQarray * array, int dim1, ...);

PQcomposite * PQgetComposite(const PGresult *res, int row_number, int 
column_number);

PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...);
int PQgetCompositeNFields(PQcomposite * composite);
char * PQgetCompositeFName(PQcomposite * composite, int fnumber);
int PQgetCompositeFNumber(PQcomposite * composite, char * fname);
Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber);
int PQgetCompositeFieldLength(PQcomposite * , int fnumber);
bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber);
char * PQgetCompositeField(PQcomposite * composite, int fnumber);

Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect 
not, but right now I'm just thinking out loud.


Thoughts? Is this worth doing?

cheers

andrew







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-08 Thread Robert Treat
On Sunday 08 June 2008 19:07:21 Gregory Stark wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> > On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote:
> >> Robert Treat <[EMAIL PROTECTED]> writes:
> >>
> >> Actually, the reason it's still 10 is that the effort expended to get it
> >> changed has been *ZERO*.  I keep asking for someone to make some
> >> measurements, do some benchmarking, anything to make a plausible case
> >> for a specific higher value as being a reasonable place to set it.
> >>
> >> The silence has been deafening.
> >
> > Not surprising really. It is a simple adjustment to make and it also is
> > easy to spot when its a problem. However it is not trivial to test for
> > (in terms of time and effort). I know 10 is wrong and so do you. If you
> > don't I am curious why I see so many posts from you saying, "Your
> > estimates are off, what is your default_statistics_target?" with yet
> > even more responses saying, "Uhh 10."
>
> Ah, but we only ever hear about the cases where it's wrong of course. In
> other words even if we raised it to some optimal value we would still have
> precisely the same experience of seeing only posts on list about it being
> insufficient.
>

The slipside to this is that we're not trying to find the perfect setting, 
we're just trying to determine a number that will cause more benefit than 
harm compared to the number we have now. While I am sure there are cases 
where 100 is too low as well, I cannot recall ever having seen someone 
suggest lowering the default_stats_target to something less than 100.  (I 
know sit back and wait for someone to comb the archives, just to find that 1 
time). 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Robert Treat
On Saturday 07 June 2008 16:22:56 Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Perhaps we need a GUC that says expert_mode = on. ...  Another idea
> > might be to make such command options superuser only, to ensure the
> > power is available, yet only in the hands of, by-definition, the trusted
> > few.
>
> This all seems pretty useless, as the sort of user most likely to shoot
> himself in the foot will also always be running as superuser.
>

yeah, i'm not a big fan of "set enable_footgun=true" since the people likely 
to get tripped up are going to blindly enable these modes. 

otoh, if we do such a thing, i would be a big fan of calling 
it "enable_footgun" :-)

> I'd much rather see us expend more effort on speeding up the checks
> than open holes in the system.
>

and i'm sure no one is against that idea, but you're never going to be able to 
match the performance of just avoiding the check. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> The only change of some consideration is that we will need two passes
> over pg_class to get the list of relations to vacuum, instead of one as
> we do currently.  The problem is that we first need to fetch the
> (heap relid, toast relid) mapping before attempting to figure out if any
> given TOAST table needs vacuuming.  This is because we want to be using
> the main table's pg_autovacuum, and we can't get at that unless we know
> the main relid.

Umm ... is it chiseled in stone someplace that toast tables shouldn't
have their own pg_autovacuum entries?  Seems like that might be a
reasonable component of a "whole nine yards" approach.

> Should we display TOAST tables separately in pg_stat_*_tables?  (Maybe
> pg_stat_toast_tables?)

+1 for pg_stat_toast_tables, I think.  If you separate them out then
there will need to be some kind of smarts to help the user figure out
which main table a toast table belongs to.  This would be easy with a
separate view.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling GUCS

2008-06-08 Thread Gregory Stark
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

> On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote:
>> Robert Treat <[EMAIL PROTECTED]> writes:
>
>> Actually, the reason it's still 10 is that the effort expended to get it
>> changed has been *ZERO*.  I keep asking for someone to make some
>> measurements, do some benchmarking, anything to make a plausible case
>> for a specific higher value as being a reasonable place to set it.
>
>> The silence has been deafening.
>
> Not surprising really. It is a simple adjustment to make and it also is
> easy to spot when its a problem. However it is not trivial to test for
> (in terms of time and effort). I know 10 is wrong and so do you. If you
> don't I am curious why I see so many posts from you saying, "Your
> estimates are off, what is your default_statistics_target?" with yet
> even more responses saying, "Uhh 10." 

Ah, but we only ever hear about the cases where it's wrong of course. In other
words even if we raised it to some optimal value we would still have precisely
the same experience of seeing only posts on list about it being insufficient.

What's needed is some speed benchmarks for complex queries with varying size
statistics so we can see how badly large statistic tables hurt planning time. 

The flip side of seeing how much larger tables help planning accuracy is much
harder to measure. Offhand I don't see any systematic way to go about it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] math error or rounding problem Money type

2008-06-08 Thread Gregory Stark
"Mark Kirkwood" <[EMAIL PROTECTED]> writes:

> IFAIK (dimly recalling numerical analysis courses at university) SUM and ROUND
> can *never* be commuted. In general the recommended approach is to round as
> late as possible and as few times are possible - so your 1st query is the
> correct or best way to go.

I don't think "as late as possible" applies with money. If you were dealing
with approximate measurements you want to round as late as possible because
rounding is throwing away precision. But if you're dealing with money you're
dealing with exact quantities. 

There is only going to be one correct time to round and that's whenever you're
creating an actual ledger item or order line item or whatever. Once you've
calculated how much interest to credit or whatever you have to make that
credit an exact number of cents and the $0.004 you lost or gained in rounding
never comes up again.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] handling TOAST tables in autovacuum

2008-06-08 Thread Alvaro Herrera
Hi,

We've been making noises about dealing with TOAST tables as separate
entities in autovacuum for some time now.  So here's a proposal:

Let's do it.

That's about it :-)

The only change of some consideration is that we will need two passes
over pg_class to get the list of relations to vacuum, instead of one as
we do currently.  The problem is that we first need to fetch the
(heap relid, toast relid) mapping before attempting to figure out if any
given TOAST table needs vacuuming.  This is because we want to be using
the main table's pg_autovacuum, and we can't get at that unless we know
the main relid.

Another open question is whether the TOAST table should be processed at
all if the main table is vacuumed.  My opinion is we don't -- if we're
going to deal with them separately, let's go the whole nine yards.
Autovacuum will only process a toast table when, by itself, it shows
that it needs processing.  (Obviously this doesn't mean we change
semantics of user-invoked VACUUM -- those will continue to vacuum the
TOAST table along the main table).

Should we display TOAST tables separately in pg_stat_*_tables?  (Maybe
pg_stat_toast_tables?)

Thoughts?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-08 Thread Tom Lane
"Nathan Boley" <[EMAIL PROTECTED]> writes:
> ... There are two potential problems that I see with this approach:

> 1) It assumes the = is equivalent to <= and >= . This is certainly
> true for real numbers, but is it true for every equality relation that
> eqsel predicts for?

The cases that compute_scalar_stats is used in have that property, since
the < and = operators are taken from the same btree opclass.

> Do people think that the improved estimates would be worth the
> additional overhead?

Your argument seems to consider only columns having a normal
distribution.  How badly does it fall apart for non-normal
distributions?  (For instance, Zipfian distributions seem to be pretty
common in database work, from what I've seen.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

2008-06-08 Thread Nathan Boley
Currently eqsel assumes that, except for the values stored as mcv's,
the number of times that a value appears in a table column is
independent of it's value. Unfortunately, this often yields
inappropriate selectivity estimates and frequently leads to
inappropriate plans.

As an example, consider an insurance company that keeps a record of
patient heights. Assume there are a 100 patient heights in this
column, and they are distributed normally with a mean of 1.7526 and a
standard deviation of 0.0762. Furthermore, assume that the heights are
only measured to the nearest centimeter. Then, we'd expect there to be
about 73 distinct heights, with a SD of 1.5.

Ignoring the effects of MCV's, the planner expects
  SELECT height FROM heights WHERE height = 1.75;
to yield roughly 13000 results. However, given that we know the
underlying distribution, we would expect to see ~52000 results.

Similarly, the planner expects to see 13000 results from
  SELECT 1.75 FROM heights WHERE height = 2.05;
While we expect to see 2.7.

Obviously this example is not totally convincing: if I were to post
this to pg-general looking for advice I'm sure that everyone would
tell me to just increase the size of my mcv stats. However, in cases
where the number of distinct values is higher, this isn't always
feasible. Also, why store a list of 50 values and their frequencies
when 10 extra would provide the same plans without bloating
pg_statistics?

To combat this problem, I have two different proposals.

Idea 1: Keep an array of stadistinct that correspond to each bucket size.

In the example above, ( again ignoring mcv's ) the quantile data is

0%10%   20%   30%   40%   50%   60%   70%   80%   90%   100%
1.38  1.66  1.69  1.71  1.73  1.75  1.77  1.79  1.82  1.85  2.12

with numdistinct values of ( respectively )

29  2  2  2  2  2  2  3  3 25

For the two above examples, this new approach would yield selectivity
estimates of

(100/10)/2 = 5  ( vs an actual ED of ~52000 )
and
(100/10)/25 = 4000  ( vs an actual ED of ~2.7 )

Furthermore, this is done without mcvs. Since mcv's would make the
histogram more sensitive to the edges, the estimates with mcv's should
be correspondingly better.


There are two potential problems that I see with this approach:

1) It assumes the = is equivalent to <= and >= . This is certainly
true for real numbers, but is it true for every equality relation that
eqsel predicts for?

2) It bloats the stats table.

Idea 2: Keep a correlation statistic between ndistinct and bucket size

This addresses problem #2.

In lieu of keeping an actual list of ndistinct per histogram bucket,
we store the linear scaling coefficient between histogram bucket width
and ndistinct/(avg ndistinct). To visualize this, it is easiest to
consider plotting the bucket width versus ndistinct. The scaling
coefficient is the linear line that passes through origin and
minimizes the square of the difference between it's estimate for
ndistinct and the actual value.

When I apply this method to the above data I find a coefficient of
13.63 for an average ndist of 72/10. This provides selectivity
estimates, for the above two examples, of
(100/10)/( 13.63*7.2*(1.77 - 1.75)  ) = 50950 ( vs an actual ED of ~52000 )
and
(100/10)/( 13.63*7.2*(2.12 - 1.85)  ) = 3774  ( vs an actual ED of ~2.7 )

Although this yields better results than idea 1 for this particular
example, it will be much more sensitive to weird distributions.

Obviously there are some special cases to consider: we wouldn't want
the stats to be skewed such that they provide really bad plans.
However, with some carefully designed caps I believe that we could
ensure that the estimates are at least as good as they are now. In
fact, I'm not certain that an R^2 penalty is the correct loss
function. Ideally, we want to minimize the extra time that the db
spends by choosing an incorrect plan. Maybe slight overestimations are
better than slight underestimations? Maybe the cost of the occasional
(really) bad plan is less than the cost of a bunch of kinda bad plans?

Finally, we aren't limited to just one coefficient. We could also
store multiple coefficents to improve our estimates, and provide a
compromise between ideas 1 and 2.

Food for future thought...

I addition to the previous benefits, I think that this method has the
potential to make the process by which MCV are chosen (or not chosen)
smarter. Now the planner chooses a value to be an mcv candidate if
it's frequency is greater than 1.25 * the average frequency. Given
that this improved selectivity estimate is implemented, maybe a better
way would be to include a value as an mcv if it's a) above a certain
threshold and b) the histogram selectivity estimator does do a poor
job.

What are peoples thoughts on idea 1 vs idea 2?

Am I missing any relevant details about the planner's operation?

Do people think that the improved estimates would be worth the
additional overhead?

-Nathan

-- 
Sent via pgsql-hackers mai

Re: [PATCHES] [HACKERS] TRUNCATE TABLE with IDENTITY

2008-06-08 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> 2. I had first dismissed Neil's idea of transactional sequence updates
>> as impossible, but on second look it could be done.  Suppose RESTART
>> IDENTITY does this for each sequence;
>> 
>> * obtain AccessExclusiveLock;
>> * assign a new relfilenode;
>> * insert a sequence row with all parameters copied except
>> last_value copies start_value;
>> * hold AccessExclusiveLock till commit.

> Hmm, this kills the idea of moving sequence data to a single
> non-transactional catalog :-(

Well, there are a number of holes in our ideas of how to do that anyway.
But offhand I don't see why we couldn't distinguish regular heap_update
from update_in_place on single rows within a catalog.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers