Re: [HACKERS] libpq support for arrays and composites
Andrew Chernow wrote: libpqtypes already implemented this. It is a different approach but provides the same functionality; with the inclusion of being able to handle every data type. libpqtypes uses the PGresult API for composites and arrays, rather than adding a new set of functions. Yes, I thought you'd say that :-) This has some significant limitations - for example (quoting from your docs) Arrays are only handled using binary format. This means that any type used as an array element must be put and gotten in binary format. If a user-defined type does not implement a send and recv function in the backend, it can not be used as an array element. That makes it quite useless for my intended purpose. I also am not particularly enamoured of the libpqtypes way of doing things, which feels rather foreign to me. Lastly, the idea is to provide extra facilities to libpq clients without requiring any extra library. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Euler Taveira de Oliveira wrote: >> And based on your proposal, it'll be needed to add reloptions to toast >> tables too. IMO, we should keep that code as simple as possible. > Sure, what's the problem with that? We only need to make sure that > ALTER TABLE works for setting reloptions for toast tables. ... actually, the problem is going to be "how do you get pg_dump to dump and reload such settings"? The toast tables are not going to have the same names after dump/reload. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq support for arrays and composites
Andrew Dunstan wrote: One of the areas where libpq seems to be severely lacking is in handling arrays and composites in query results. I'd like to set about rectifying that. Ideally this would mean that drivers using libpq could easily and reliably deliver such objects suitably structured in their particular languages (e.g. lists and hashes in Perl). One complicating factor I see is that there is no protocol level support for anything other than simple objects - each data value is simply a stream of bytes of a known length. We would therefore need some pretty robust processing to pick apart structured objects. We'll need a new API to handle such objects. I'm thinking of something like: PQarray * PQgetArray( const PGresult *res, int row_number, int column_number); int PQgetArrayNDims(PQarray * array); int PQgetArrayLower(PQarray * array, int dim); int PQgetArrayUpper(PQarray * array, int dim); int PQgetArrayElementLength(PQarray * array, int dim1, ...); bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...); char * PQgetArrayElement(PQarray * array, int dim1, ...); PQcomposite * PQgetComposite(const PGresult *res, int row_number, int column_number); PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...); int PQgetCompositeNFields(PQcomposite * composite); char * PQgetCompositeFName(PQcomposite * composite, int fnumber); int PQgetCompositeFNumber(PQcomposite * composite, char * fname); Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber); int PQgetCompositeFieldLength(PQcomposite * , int fnumber); bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber); char * PQgetCompositeField(PQcomposite * composite, int fnumber); Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect not, but right now I'm just thinking out loud. Thoughts? Is this worth doing? cheers andrew libpqtypes already implemented this. It is a different approach but provides the same functionality; with the inclusion of being able to handle every data type. libpqtypes uses the PGresult API for composites and arrays, rather than adding a new set of functions. To support this, one must be able to convert all data types (unless you are only supporting text results) because composites can be made up of any data type. Simple arrays: http://libpqtypes.esilo.com/man3/pqt-specs.html#array Composite arrays: http://libpqtypes.esilo.com/man3/pqt-composites.html EXAMPLE OF GETTING A COMPOSITE: (taken from http://libpqtypes.esilo.com/ home page) /* Let's get a composite. * CREATE TYPE simple AS (a int4, t text); */ PGint4 i4; PGtext text; PGresult *res, *simple; int resultFormat = 1; /* Your composites need to be registered */ PQregisterTypeHandler(conn, "simple", NULL, NULL); /* 2nd arg, PGparam, can be NULL if there are no query params. * Composites require binary results, so we can't use PQexec(). */ res = PQparamExec(conn, NULL, "SELECT my_simple FROM t", resultFormat); if(!res) fprintf(stderr, "ERROR: %s\n", PQgeterror()); /* Get the simple composite, which is exposed as a PGresult. */ PQgetf(res, 0, "%simple", 0, &simple); PQclear(res); /* no longer needed */ /* Get the simple composite attributes from the simple result. * Reference fields by name by using a '#' rather than a '%'. * The field names are the composite attribute names. */ PQgetf(simple, 0, "#int4 #text", "a", &i4, "t", &text); PQclear(simple); -- Andrew Chernow eSilo, LLC every bit counts http://www.esilo.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Euler Taveira de Oliveira wrote: > FYI, I have a WIP patch to remove pg_autovacuum in favor of reloptions. Really? Please send it my way to review/apply as soon as you have it ready, independently of what we do with toast tables. > Let's keep it simple. Why not just adding a toast_enabled flag (disabled > by default) in pg_autovacuum? If it's set then main and toast tables are > processed by autovac. Actually I think your proposal is more cumbersome to use and less flexible, because you can't set specific values for the other options for toast tables. > And based on your proposal, it'll be needed to add reloptions to toast > tables too. IMO, we should keep that code as simple as possible. Sure, what's the problem with that? We only need to make sure that ALTER TABLE works for setting reloptions for toast tables. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [CORE] [HACKERS] Automating our version-stamping a bit better
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Sunday, June 08, 2008 21:27:03 -0400 Tom Lane <[EMAIL PROTECTED]> wrote: > Andrew Dunstan <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> I'm tempted to suggest letting the script invoke autoconf, too, >>> but that would require standardizing where to find the correct >>> version of autoconf for each branch; so it might not be such a >>> great idea. > >> Unfortunately that's true. Maybe we could agree on using an alias for >> the right version of autoconf, but it seems likely to be error prone. > > Actually, the way I do things is that my setup script for working > with each particular version tree includes adjusting $PATH so that > the right autoconf gets found just by saying "autoconf". If everyone > who might tag releases wanted to do it the same way, then we could > just let the script say "autoconf". But I'm not sure anybody else > likes that plan. What I was thinking was just to have the script > print out something like > > Tagged tree as 8.3.4 > Don't forget to run autoconf 2.59 before committing I like that one ... - -- Marc G. FournierHub.Org Hosting Solutions S.A. (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.9 (FreeBSD) iEYEARECAAYFAkhMj4MACgkQ4QvfyHIvDvNWAACfeEuX8PCwbPgZLutpya859T+5 sDYAoKgTnLoypgDOwr4TSYVd+G5Dn+kn =Cl6d -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Alvaro Herrera wrote: We've been making noises about dealing with TOAST tables as separate entities in autovacuum for some time now. So here's a proposal: Let's keep it simple. Why not just adding a toast_enabled flag (disabled by default) in pg_autovacuum? If it's set then main and toast tables are processed by autovac. FYI, I have a WIP patch to remove pg_autovacuum in favor of reloptions. And based on your proposal, it'll be needed to add reloptions to toast tables too. IMO, we should keep that code as simple as possible. -- Euler Taveira de Oliveira http://www.timbira.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automating our version-stamping a bit better
Andrew Dunstan <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> I'm tempted to suggest letting the script invoke autoconf, too, >> but that would require standardizing where to find the correct >> version of autoconf for each branch; so it might not be such a >> great idea. > Unfortunately that's true. Maybe we could agree on using an alias for > the right version of autoconf, but it seems likely to be error prone. Actually, the way I do things is that my setup script for working with each particular version tree includes adjusting $PATH so that the right autoconf gets found just by saying "autoconf". If everyone who might tag releases wanted to do it the same way, then we could just let the script say "autoconf". But I'm not sure anybody else likes that plan. What I was thinking was just to have the script print out something like Tagged tree as 8.3.4 Don't forget to run autoconf 2.59 before committing regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Automating our version-stamping a bit better
Tom Lane wrote: I'm tempted to suggest letting the script invoke autoconf, too, but that would require standardizing where to find the correct version of autoconf for each branch; so it might not be such a great idea. Unfortunately that's true. Maybe we could agree on using an alias for the right version of autoconf, but it seems likely to be error prone. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Automating our version-stamping a bit better
So while tagging the upcoming releases, I got annoyed once again about what a tedious, error-prone bit of donkeywork it is. You've got to find and update the sub-version numbers, and *not* any chance occurrence of the same strings (eg s/20/21/g for version 7.4.21 would've mangled some copyright dates). And the changes tend to move around a little bit in each back branch, making it even easier to blow it. ISTM we should get the machine to do it for us. I propose to write a little perl script to be used like this: cd top-level-of-tree sh src/tools/version_stamp 22 cvs commit -m "Stamp release 7.4.22" The script takes just one argument, which could be "devel", "betaN", "rcN", or just a minor version number "N". Note the assumption that the script knows the major version. Since we expect to adjust the script from time to time for version changes anyway, I don't see why we shouldn't have the major version stored right in the script. Tagging a new development branch after a release is split off would then look like cd src/tools edit version_stamp, update a variable assignment at its head cvs commit -m "Update version_stamp for 8.5" cd ../.. sh src/tools/version_stamp devel cvs commit -m "Stamp CVS HEAD as 8.5devel" Note that this is not all that helpful if we just do it in CVS HEAD. I propose adding the script to all active branches back to 7.4, with suitable adjustments for each branch as needed. I think we should probably include configure.in in the set of files that this script updates, and get rid of the current two-step arrangement where Marc stamps configure.in/configure after somebody else stamps everything else. Marc's tarball-wrapping process would thus look roughly like sh src/tools/version_stamp 4 autoconf cvs commit -m "Stamp release 8.3.4" cvs tag REL8_3_4 ... build tarball ... I'm tempted to suggest letting the script invoke autoconf, too, but that would require standardizing where to find the correct version of autoconf for each branch; so it might not be such a great idea. Thoughts, objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump restore time and Foreign Keys
On Sunday 08 June 2008 20:12:15 Tom Lane wrote: > Robert Treat <[EMAIL PROTECTED]> writes: > > and i'm sure no one is against that idea, but you're never going to be > > able to match the performance of just avoiding the check. > > We'll never be able to match the performance of not having transactions, > either, but the community has never for a moment considered having a > "no transactions" mode. > it's unclear what a "no transaction" mode would mean, but I'd be willing to guess some people have consider aspects of it (we've just never had agreement) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq support for arrays and composites
Tom Lane wrote: Andrew Dunstan <[EMAIL PROTECTED]> writes: One complicating factor I see is that there is no protocol level support for anything other than simple objects - each data value is simply a stream of bytes of a known length. We would therefore need some pretty robust processing to pick apart structured objects. Well, it's not that hard, the quoting/escaping rules for array_out and record_out are well defined. Are you intending that these operations support both text and binary results? I'm a bit open on that. The array accessors with ... parameter lists strike me as a bit dangerous, because there is no way at all to verify that the caller is passing the expected number of dimensions. Can't that be made tighter? Well, the only alternative I can think of is to make the client walk the array one dimension at a time. Something like: PQarray * PQgetInnerArray(PQarray * array, int dim); then when we're down to the leaf level, we could have: int PQgetArrayElementLength(PQarray * array, int dim); bool PQgetArrayElementIsNull(PQarray * array, int dim); char * PQgetArrayElement(PQarray * array, int dim); That strikes me as somewhat more cumbersome, so I guess the question is whether it's worth it. It probably fits the slightly clunky feel of libpq. Also you need to spell out the error handling conventions for each of these. I think you missed some "free()" operations. Oh, yes, both of these are certainly true. This isn't really even a proposal yet, more a sketch that would lead to a proposal. I'm hoping to get some other input too, before settling this down, especially from driver writers. It might also be useful to provide some functions that form an array or composite value from per-element strings, ie, the converse of the de-construction routines. Here I'd be happy to skip the binary case. Yeah, that had occurred to me. Will think about it more, although it could possibly be done as a separate project, too. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Alvaro Herrera <[EMAIL PROTECTED]> writes: > The point here is that if the user disables autovac for the main table, > then it's expected that it is automagically disabled for the toast table > as well, for the usual case where they are disabling it because the > table is too big. Hmm, good point. OK, two passes it is. (I thought about remembering the toast table rows in memory so as not to scan the catalog twice, but I'm not sure you really save much that way.) Another thing to think about here is locking: I believe you need to get a vacuum-type lock on the parent table not only the toast table, so vacuuming a toast table without any knowledge of which table is its parent ain't gonna fly anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > The only change of some consideration is that we will need two passes > > over pg_class to get the list of relations to vacuum, instead of one as > > we do currently. The problem is that we first need to fetch the > > (heap relid, toast relid) mapping before attempting to figure out if any > > given TOAST table needs vacuuming. This is because we want to be using > > the main table's pg_autovacuum, and we can't get at that unless we know > > the main relid. > > Umm ... is it chiseled in stone someplace that toast tables shouldn't > have their own pg_autovacuum entries? Seems like that might be a > reasonable component of a "whole nine yards" approach. No, but I think it's a bit awkward for users to follow _only_ its own entry. I forgot to mention that in the patch I currently have, what autovacuum does is try to get the TOAST table's own pg_autovacuum entry, and if that fails, get the main rel's entry. The point here is that if the user disables autovac for the main table, then it's expected that it is automagically disabled for the toast table as well, for the usual case where they are disabling it because the table is too big. Automatically processing the toast table would be completely unexpected, and most likely unwelcome. Of course, for the even rarer cases when you want to disable it for the main rel and enable it for the toast table, you can do that too. (I can't think of a case where this would be useful though.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump restore time and Foreign Keys
Robert Treat <[EMAIL PROTECTED]> writes: > and i'm sure no one is against that idea, but you're never going to be able > to > match the performance of just avoiding the check. We'll never be able to match the performance of not having transactions, either, but the community has never for a moment considered having a "no transactions" mode. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libpq support for arrays and composites
Andrew Dunstan <[EMAIL PROTECTED]> writes: > One complicating factor I see is that there is no protocol level support > for anything other than simple objects - each data value is simply a > stream of bytes of a known length. We would therefore need some pretty > robust processing to pick apart structured objects. Well, it's not that hard, the quoting/escaping rules for array_out and record_out are well defined. Are you intending that these operations support both text and binary results? The array accessors with ... parameter lists strike me as a bit dangerous, because there is no way at all to verify that the caller is passing the expected number of dimensions. Can't that be made tighter? Also you need to spell out the error handling conventions for each of these. I think you missed some "free()" operations. It might also be useful to provide some functions that form an array or composite value from per-element strings, ie, the converse of the de-construction routines. Here I'd be happy to skip the binary case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] libpq support for arrays and composites
One of the areas where libpq seems to be severely lacking is in handling arrays and composites in query results. I'd like to set about rectifying that. Ideally this would mean that drivers using libpq could easily and reliably deliver such objects suitably structured in their particular languages (e.g. lists and hashes in Perl). One complicating factor I see is that there is no protocol level support for anything other than simple objects - each data value is simply a stream of bytes of a known length. We would therefore need some pretty robust processing to pick apart structured objects. We'll need a new API to handle such objects. I'm thinking of something like: PQarray * PQgetArray( const PGresult *res, int row_number, int column_number); int PQgetArrayNDims(PQarray * array); int PQgetArrayLower(PQarray * array, int dim); int PQgetArrayUpper(PQarray * array, int dim); int PQgetArrayElementLength(PQarray * array, int dim1, ...); bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...); char * PQgetArrayElement(PQarray * array, int dim1, ...); PQcomposite * PQgetComposite(const PGresult *res, int row_number, int column_number); PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...); int PQgetCompositeNFields(PQcomposite * composite); char * PQgetCompositeFName(PQcomposite * composite, int fnumber); int PQgetCompositeFNumber(PQcomposite * composite, char * fname); Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber); int PQgetCompositeFieldLength(PQcomposite * , int fnumber); bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber); char * PQgetCompositeField(PQcomposite * composite, int fnumber); Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect not, but right now I'm just thinking out loud. Thoughts? Is this worth doing? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
On Sunday 08 June 2008 19:07:21 Gregory Stark wrote: > "Joshua D. Drake" <[EMAIL PROTECTED]> writes: > > On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote: > >> Robert Treat <[EMAIL PROTECTED]> writes: > >> > >> Actually, the reason it's still 10 is that the effort expended to get it > >> changed has been *ZERO*. I keep asking for someone to make some > >> measurements, do some benchmarking, anything to make a plausible case > >> for a specific higher value as being a reasonable place to set it. > >> > >> The silence has been deafening. > > > > Not surprising really. It is a simple adjustment to make and it also is > > easy to spot when its a problem. However it is not trivial to test for > > (in terms of time and effort). I know 10 is wrong and so do you. If you > > don't I am curious why I see so many posts from you saying, "Your > > estimates are off, what is your default_statistics_target?" with yet > > even more responses saying, "Uhh 10." > > Ah, but we only ever hear about the cases where it's wrong of course. In > other words even if we raised it to some optimal value we would still have > precisely the same experience of seeing only posts on list about it being > insufficient. > The slipside to this is that we're not trying to find the perfect setting, we're just trying to determine a number that will cause more benefit than harm compared to the number we have now. While I am sure there are cases where 100 is too low as well, I cannot recall ever having seen someone suggest lowering the default_stats_target to something less than 100. (I know sit back and wait for someone to comb the archives, just to find that 1 time). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump restore time and Foreign Keys
On Saturday 07 June 2008 16:22:56 Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > Perhaps we need a GUC that says expert_mode = on. ... Another idea > > might be to make such command options superuser only, to ensure the > > power is available, yet only in the hands of, by-definition, the trusted > > few. > > This all seems pretty useless, as the sort of user most likely to shoot > himself in the foot will also always be running as superuser. > yeah, i'm not a big fan of "set enable_footgun=true" since the people likely to get tripped up are going to blindly enable these modes. otoh, if we do such a thing, i would be a big fan of calling it "enable_footgun" :-) > I'd much rather see us expend more effort on speeding up the checks > than open holes in the system. > and i'm sure no one is against that idea, but you're never going to be able to match the performance of just avoiding the check. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] handling TOAST tables in autovacuum
Alvaro Herrera <[EMAIL PROTECTED]> writes: > The only change of some consideration is that we will need two passes > over pg_class to get the list of relations to vacuum, instead of one as > we do currently. The problem is that we first need to fetch the > (heap relid, toast relid) mapping before attempting to figure out if any > given TOAST table needs vacuuming. This is because we want to be using > the main table's pg_autovacuum, and we can't get at that unless we know > the main relid. Umm ... is it chiseled in stone someplace that toast tables shouldn't have their own pg_autovacuum entries? Seems like that might be a reasonable component of a "whole nine yards" approach. > Should we display TOAST tables separately in pg_stat_*_tables? (Maybe > pg_stat_toast_tables?) +1 for pg_stat_toast_tables, I think. If you separate them out then there will need to be some kind of smarts to help the user figure out which main table a toast table belongs to. This would be easy with a separate view. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Overhauling GUCS
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > On Fri, 2008-06-06 at 20:19 -0400, Tom Lane wrote: >> Robert Treat <[EMAIL PROTECTED]> writes: > >> Actually, the reason it's still 10 is that the effort expended to get it >> changed has been *ZERO*. I keep asking for someone to make some >> measurements, do some benchmarking, anything to make a plausible case >> for a specific higher value as being a reasonable place to set it. > >> The silence has been deafening. > > Not surprising really. It is a simple adjustment to make and it also is > easy to spot when its a problem. However it is not trivial to test for > (in terms of time and effort). I know 10 is wrong and so do you. If you > don't I am curious why I see so many posts from you saying, "Your > estimates are off, what is your default_statistics_target?" with yet > even more responses saying, "Uhh 10." Ah, but we only ever hear about the cases where it's wrong of course. In other words even if we raised it to some optimal value we would still have precisely the same experience of seeing only posts on list about it being insufficient. What's needed is some speed benchmarks for complex queries with varying size statistics so we can see how badly large statistic tables hurt planning time. The flip side of seeing how much larger tables help planning accuracy is much harder to measure. Offhand I don't see any systematic way to go about it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] math error or rounding problem Money type
"Mark Kirkwood" <[EMAIL PROTECTED]> writes: > IFAIK (dimly recalling numerical analysis courses at university) SUM and ROUND > can *never* be commuted. In general the recommended approach is to round as > late as possible and as few times are possible - so your 1st query is the > correct or best way to go. I don't think "as late as possible" applies with money. If you were dealing with approximate measurements you want to round as late as possible because rounding is throwing away precision. But if you're dealing with money you're dealing with exact quantities. There is only going to be one correct time to round and that's whenever you're creating an actual ledger item or order line item or whatever. Once you've calculated how much interest to credit or whatever you have to make that credit an exact number of cents and the $0.004 you lost or gained in rounding never comes up again. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] handling TOAST tables in autovacuum
Hi, We've been making noises about dealing with TOAST tables as separate entities in autovacuum for some time now. So here's a proposal: Let's do it. That's about it :-) The only change of some consideration is that we will need two passes over pg_class to get the list of relations to vacuum, instead of one as we do currently. The problem is that we first need to fetch the (heap relid, toast relid) mapping before attempting to figure out if any given TOAST table needs vacuuming. This is because we want to be using the main table's pg_autovacuum, and we can't get at that unless we know the main relid. Another open question is whether the TOAST table should be processed at all if the main table is vacuumed. My opinion is we don't -- if we're going to deal with them separately, let's go the whole nine yards. Autovacuum will only process a toast table when, by itself, it shows that it needs processing. (Obviously this doesn't mean we change semantics of user-invoked VACUUM -- those will continue to vacuum the TOAST table along the main table). Should we display TOAST tables separately in pg_stat_*_tables? (Maybe pg_stat_toast_tables?) Thoughts? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
"Nathan Boley" <[EMAIL PROTECTED]> writes: > ... There are two potential problems that I see with this approach: > 1) It assumes the = is equivalent to <= and >= . This is certainly > true for real numbers, but is it true for every equality relation that > eqsel predicts for? The cases that compute_scalar_stats is used in have that property, since the < and = operators are taken from the same btree opclass. > Do people think that the improved estimates would be worth the > additional overhead? Your argument seems to consider only columns having a normal distribution. How badly does it fall apart for non-normal distributions? (For instance, Zipfian distributions seem to be pretty common in database work, from what I've seen.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Currently eqsel assumes that, except for the values stored as mcv's, the number of times that a value appears in a table column is independent of it's value. Unfortunately, this often yields inappropriate selectivity estimates and frequently leads to inappropriate plans. As an example, consider an insurance company that keeps a record of patient heights. Assume there are a 100 patient heights in this column, and they are distributed normally with a mean of 1.7526 and a standard deviation of 0.0762. Furthermore, assume that the heights are only measured to the nearest centimeter. Then, we'd expect there to be about 73 distinct heights, with a SD of 1.5. Ignoring the effects of MCV's, the planner expects SELECT height FROM heights WHERE height = 1.75; to yield roughly 13000 results. However, given that we know the underlying distribution, we would expect to see ~52000 results. Similarly, the planner expects to see 13000 results from SELECT 1.75 FROM heights WHERE height = 2.05; While we expect to see 2.7. Obviously this example is not totally convincing: if I were to post this to pg-general looking for advice I'm sure that everyone would tell me to just increase the size of my mcv stats. However, in cases where the number of distinct values is higher, this isn't always feasible. Also, why store a list of 50 values and their frequencies when 10 extra would provide the same plans without bloating pg_statistics? To combat this problem, I have two different proposals. Idea 1: Keep an array of stadistinct that correspond to each bucket size. In the example above, ( again ignoring mcv's ) the quantile data is 0%10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1.38 1.66 1.69 1.71 1.73 1.75 1.77 1.79 1.82 1.85 2.12 with numdistinct values of ( respectively ) 29 2 2 2 2 2 2 3 3 25 For the two above examples, this new approach would yield selectivity estimates of (100/10)/2 = 5 ( vs an actual ED of ~52000 ) and (100/10)/25 = 4000 ( vs an actual ED of ~2.7 ) Furthermore, this is done without mcvs. Since mcv's would make the histogram more sensitive to the edges, the estimates with mcv's should be correspondingly better. There are two potential problems that I see with this approach: 1) It assumes the = is equivalent to <= and >= . This is certainly true for real numbers, but is it true for every equality relation that eqsel predicts for? 2) It bloats the stats table. Idea 2: Keep a correlation statistic between ndistinct and bucket size This addresses problem #2. In lieu of keeping an actual list of ndistinct per histogram bucket, we store the linear scaling coefficient between histogram bucket width and ndistinct/(avg ndistinct). To visualize this, it is easiest to consider plotting the bucket width versus ndistinct. The scaling coefficient is the linear line that passes through origin and minimizes the square of the difference between it's estimate for ndistinct and the actual value. When I apply this method to the above data I find a coefficient of 13.63 for an average ndist of 72/10. This provides selectivity estimates, for the above two examples, of (100/10)/( 13.63*7.2*(1.77 - 1.75) ) = 50950 ( vs an actual ED of ~52000 ) and (100/10)/( 13.63*7.2*(2.12 - 1.85) ) = 3774 ( vs an actual ED of ~2.7 ) Although this yields better results than idea 1 for this particular example, it will be much more sensitive to weird distributions. Obviously there are some special cases to consider: we wouldn't want the stats to be skewed such that they provide really bad plans. However, with some carefully designed caps I believe that we could ensure that the estimates are at least as good as they are now. In fact, I'm not certain that an R^2 penalty is the correct loss function. Ideally, we want to minimize the extra time that the db spends by choosing an incorrect plan. Maybe slight overestimations are better than slight underestimations? Maybe the cost of the occasional (really) bad plan is less than the cost of a bunch of kinda bad plans? Finally, we aren't limited to just one coefficient. We could also store multiple coefficents to improve our estimates, and provide a compromise between ideas 1 and 2. Food for future thought... I addition to the previous benefits, I think that this method has the potential to make the process by which MCV are chosen (or not chosen) smarter. Now the planner chooses a value to be an mcv candidate if it's frequency is greater than 1.25 * the average frequency. Given that this improved selectivity estimate is implemented, maybe a better way would be to include a value as an mcv if it's a) above a certain threshold and b) the histogram selectivity estimator does do a poor job. What are peoples thoughts on idea 1 vs idea 2? Am I missing any relevant details about the planner's operation? Do people think that the improved estimates would be worth the additional overhead? -Nathan -- Sent via pgsql-hackers mai
Re: [PATCHES] [HACKERS] TRUNCATE TABLE with IDENTITY
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> 2. I had first dismissed Neil's idea of transactional sequence updates >> as impossible, but on second look it could be done. Suppose RESTART >> IDENTITY does this for each sequence; >> >> * obtain AccessExclusiveLock; >> * assign a new relfilenode; >> * insert a sequence row with all parameters copied except >> last_value copies start_value; >> * hold AccessExclusiveLock till commit. > Hmm, this kills the idea of moving sequence data to a single > non-transactional catalog :-( Well, there are a number of holes in our ideas of how to do that anyway. But offhand I don't see why we couldn't distinguish regular heap_update from update_in_place on single rows within a catalog. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers