Re: [HACKERS] pg_autovacuum next steps
Tom Lane wrote: Matthew T. O'Connor [EMAIL PROTECTED] writes: I probably said that wrong, but how do backends get their stats data? They read it out of a flat file that the stats collector rewrites every so often. Ok so that would be easy to do (if we decide we want to) Is that really worth the loss of independence? I'm not sure one way or the other myself. I suppose the autovac daemon could still be run by hand for testing purposes, but it would have to run as user postgres and on the same machine as the postmaster. I'm not sure, it would allow autovacuum to check the stats more regularly. I suppose it would be possible for pg_autovacuum to recognize if it's been launched by the postmaster or not and look up stats as appropriate, but as you say, I'm not sure it's worth it, and my first cut will work exactly as the current pg_autovacuum does. Also, you didn't mention if I will be able to use the backend logging functions, I would guess that I can, but I'm not totally sure. Yes, if you are launched by the postmaster then you'd have access to its stderr etc, so you could just log via elog/ereport. Again though this puts you in a situation where the daemon *must* be launched by the postmaster or it won't work. How much arm's-length separation are you willing to give up? Well I think I would be more willing to give up the separation for logging purposes. I would think that an error message issued by a postmaster sub-process should wind up in the same location as an error issued by the postmaster proper. Also, people have requested lots of logging options such as syslog etc and I think it would be nice to get all this for free. I would think that pg_autovacuum could determine if it's a postmaster sub-process and log appropriately, does sound reasonable? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_autovacuum next steps
On Mon, 22 Mar 2004, Matthew T. O'Connor wrote: On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote: C) Most importantly, I'm not backend hacker. If someone wants to do the initial work of getting it running as a backend process, I can take it from there. A while ago, Bruce offered to help me with any backend issues I might have, so perhaps with a little help I can take a run at it. I'd be happy to help you out. Agreed. Ok, thanks for the offer to help, but I think I understated things above when I said I'll need a little help :-) I haven't looked at the code but... I have a few big picture questions. Once pg_autovacuum is launched as a postmaster sub-process, what changes? That is, currently pg_autovacuum uses libpq to connect to a database and issue queries including a vacuum / analyze command when needed. After becoming a subprocess will (should) it still use libpq to connect to the databases, I don't think It could use libpq but most definately shouldn't. so, is it even possible to do that? If not, how will it checkout the stats of all the different databases? I guess it should fork() a new backend, connect to it somehow, and use it to query the database, but I'm really not sure how this works. It can interact with the stats collector (seperate backend) in the same way that existing backends interact: through a domain socket. I'm looking through the backend startup code to see how the stats collector and the bgwriter work since they are probably two semi-close examples of what I'll have to do. I think checkpoints does something similar in that it issues a checkpoint command. The vacuum backend will call vacuum() (or something very like it) directly. I imagine that when it gets called and on which table will be based upon the existing algorithm. Thanks, Gavin ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
Ok, thanks for the offer to help, but I think I understated things above when I said I'll need a little help :-) I haven't looked at the code but... I have a few big picture questions. Once pg_autovacuum is launched as a postmaster sub-process, what changes? That is, currently pg_autovacuum uses libpq to connect to a database and issue queries including a vacuum / analyze command when needed. After becoming a subprocess will (should) it still use libpq to connect to the databases, I don't think It could use libpq but most definately shouldn't. so, is it even possible to do that? If not, how will it checkout the stats of all the different databases? I guess it should fork() a new backend, connect to it somehow, and use it to query the database, but I'm really not sure how this works. It can interact with the stats collector (seperate backend) in the same way that existing backends interact: through a domain socket. I'm looking through the backend startup code to see how the stats collector and the bgwriter work since they are probably two semi-close examples of what I'll have to do. I think checkpoints does something similar in that it issues a checkpoint command. The vacuum backend will call vacuum() (or something very like it) directly. I imagine that when it gets called and on which table will be based upon the existing algorithm. One point is this: vacuum() assumes that you are running in a fully fledged backend. There'd be a fair bit of work involved in allowing a single process to call vacuum() against multiple databases. As such, I think that a vacuum backend for a specific database should be forked upon the first connect. Also, the backend might like to try and workout if there are any active backends for its database every so often and if not, perform a final vacuum (if necessary) and exit, so that we don't have lots of idle processes sitting around. Is there a better approach than this? Gavin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_autovacuum next steps
On Mon, Mar 22, 2004 at 02:35:37AM -0500, Matthew T. O'Connor wrote: On Sun, 2004-03-21 at 23:00, Bruce Momjian wrote: C) Most importantly, I'm not backend hacker. If someone wants to do the initial work of getting it running as a backend process, I can take it from there. A while ago, Bruce offered to help me with any backend issues I might have, so perhaps with a little help I can take a run at it. I'd be happy to help you out. Agreed. Ok, thanks for the offer to help, but I think I understated things above when I said I'll need a little help :-) I have a few big picture questions. Once pg_autovacuum is launched as a postmaster sub-process, what changes? All. It's important do it as backend process. Because libpq has very, very limited and slow resources for work with backend stuff. The base should be the standard backend with different main loop that will instead socket checks some shared information about tables and calls directly vacuum stuff. In this case you can omit work with connections, parser etc. I thought about it in last days and I found perfect Tom's idea about FSM tables usage: What I had in the back of my mind was: each backend counts attempted insertions and deletions in its relcache entries (an update adds to both counts). At transaction commit or abort, we know which of these two counts represents the number of dead tuples added to each relation, so while we scan the relcache for post-xact cleanup (which we will be doing anyway) we can transfer the correct count into the shared FSM entry for the relation. This gives us a reasonably accurate count in shared memory of all the tuple obsoletions since bootup, at least for heavily-used tables. (The FSM might choose to forget about lightly-used tables.) The auto vacuumer could look at the FSM numbers to decide which tables are highest priority to vacuum. (2002-09-03 08:10:32) I looked at the code and I think extend FSM tables will pretty simple, but I unsure how relcache counters Tom thought. Tom? Karel -- Karel Zak [EMAIL PROTECTED] http://home.zf.jcu.cz/~zakkr/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg_autovacuum next steps
On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote: 1. Per Database defaults and Per table Thresholds: There are differing opinions as to the best way to providing these this feature. The primary debate is where to save the configuration data. I see three options: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. 3.Create a pg_autovacuum database inside any cluster that wants to customize their settings. Since many people do not like tools that clutter their databases by adding tables, I think option 1 (adding a pg_autovacuum table to existing databases) is right out. Using a config file would be Ok, but would require additional parsing code. My preference is option 3. I've nothing against #3 as a default, but can I put in a suggestion for 1 3, or rather some setting definable at runtime/build-time that lets you select database + schema for autovacuum to find its config data. I might be wrong, but it strikes me as the sort of thing people running shared environments will want to choose for themselves. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pg_autovacuum next steps
Gavin Sherry wrote: On Sun, 21 Mar 2004, Matthew T. O'Connor wrote: On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote: I think these configuration issues will become a lot easier if you make the autovacuum daemon a subprocess of the postmaster (like, say, the checkpoint process). Then you have access to a host of methods for storing state, handling configuration, etc. Yeah - why delay making it a backend process? :) Ok, well this was part of the reason to have this conversation. My reasons: A) I wasn't sure if people really thought this was ready to be integrated. Tom had said a while ago, that it was a good to keep it as a contrib module while it's still actively being developed. I was talking to Jan about some other work on VACUUM related to more intelligent vacuuming. Namely, maintaining a map (outside of shared memory) of blocks which have been pushed out of the free space map for VACUUM to visit (which requires a backend process) and being aware of load restrictions (ie, allowing user to only vacuum when the load average is less than X, for example) and some other leveling stuff to ensure that availability is consistent. Whilst this doesn't related to pg_autovacuum specifically, it'd be great if they could be released at the same time, I think. I don't recall the outside of shared memory part. Anyhow, the whole story goes like this: Maintain 2 bits per block that tell if the block has been vaccumed of all dead tuples since the last time it was dirtied, and if all its tuples are completely frozen. If those two conditions are true, there is no need to vacuum that block at all (Red Flag!!! On further thinking I realized that this assumes that the FSM is loss free). With a default 8K blocksize, this means 32K per 1GB segment, making 4 additional blocks. I actually think that these extra blocks should be somehow part of the heap files, so that they are subject to the regular buffer management. To keep the lock contention on them low, vacuum and backends will set/clear new flags in the bufhdr flags member. That way, the bgwriter and checkpointer will be the usual suspects to set/clear these flags in the shared bitmap array stored in the extra blocks. As to where to store these blocks, some block number arithmetic magic comes to mind. That way a blocks relnode and blockno automatically lead to the bits, even in the case of blind writes. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
On Mon, 2004-03-22 at 03:36, Gavin Sherry wrote: One point is this: vacuum() assumes that you are running in a fully fledged backend. There'd be a fair bit of work involved in allowing a single process to call vacuum() against multiple databases. I can't imagine we want to do that. As such, I think that a vacuum backend for a specific database should be forked upon the first connect. Also, the backend might like to try and workout if there are any active backends for its database every so often and if not, perform a final vacuum (if necessary) and exit, so that we don't have lots of idle processes sitting around. I don't understand, are you saying that upon postmaster startup a vacuum backend should be forked for every database in the cluster? Is there a better approach than this? One thought I had is that there wouldn't be a persistent daemon type process, rather the auto_vacuum process, like the check point process, would be kicked off periodically, it would initially be attached to template1, then when it's done there it will figure out the next database to take a look at and fork, the existing template1 process will exit, and the new process will be connected to the next database, just repeat this forking loop until done. Thoughts? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor [EMAIL PROTECTED] writes: There are differing opinions as to the best way to providing these this feature. The primary debate is where to save the configuration data. I see three options: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. 3.Create a pg_autovacuum database inside any cluster that wants to customize their settings. Since many people do not like tools that clutter their databases by adding tables, I think option 1 (adding a pg_autovacuum table to existing databases) is right out. Using a config file would be Ok, but would require additional parsing code. My preference is option 3. Personally I like #2. The claim that this requires extra code seems bogus to me --- when you are working at the C code level, reading a table takes code too. I think both 1 and 3 represent clutter. Also, they will introduce entirely new sets of problems that we already know how to deal with in the case of config files, but have no experience with for config tables. Some examples: how do you get the daemon to notice that you changed the config (SIGHUP for config files, ?? otherwise)? Can you easily get at the config data from a daemon process regardless of which database it's attached to (if any)? If the config data is in a table, what sorts of interesting problems will arise from trying to vacuum that table itself? Basically I see config tables as cutting against the grain of everything we've done to date in Postgres DBA-level configuration, and I do not see a reason to justify that sort of reinvention. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] pg_autovacuum next steps
Gavin Sherry [EMAIL PROTECTED] writes: One point is this: vacuum() assumes that you are running in a fully fledged backend. There'd be a fair bit of work involved in allowing a single process to call vacuum() against multiple databases. Make that it isn't going to happen. As such, I think that a vacuum backend for a specific database should be forked upon the first connect. Also, the backend might like to try and workout if there are any active backends for its database every so often and if not, perform a final vacuum (if necessary) and exit, so that we don't have lots of idle processes sitting around. Lots of idle processes sitting around is right out, too. Remember that each one would eat a backend connection slot. I think we are going to have to limit this to *one* process at a time. What that probably means is that we successively launch an autovacuum process against each database, it does whatever seems appropriate in that database and then quits. We could manage this just like checkpoints are presently managed --- the only thing the postmaster has to know is the desired idle period between end of one autovacuum and start of the next. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_autovacuum next steps
On Mon, 2004-03-22 at 07:25, Richard Huxton wrote: On Monday 22 March 2004 03:36, Matthew T. O'Connor wrote: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. 3.Create a pg_autovacuum database inside any cluster that wants to customize their settings.parsing code. My preference is option 3. I've nothing against #3 as a default, but can I put in a suggestion for 1 3, or rather some setting definable at runtime/build-time that lets you select database + schema for autovacuum to find its config data. I might be wrong, but it strikes me as the sort of thing people running shared environments will want to choose for themselves. If pg_autovacuum was being designed to live forever as a client app, then I agree admins having a choice would be good. But as we are going to eventually move any auto_vacuum data and settings into the system tables (when autovacuum is part of the system), I don't see the need to expend the extra cycles, especially since people seem to be pushing hard for autovacuum to be a backend function sooner rather than later. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
On Mon, 2004-03-22 at 04:23, Karel Zak wrote: All. It's important do it as backend process. Because libpq has very, very limited and slow resources for work with backend stuff. Agreed. The base should be the standard backend with different main loop that will instead socket checks some shared information about tables and calls directly vacuum stuff. In this case you can omit work with connections, parser etc. So am I to understand I can start up a postmaster subprocess and then be able to monitor the activity of all the databases? I guess that makes sense since I would be talking to the stats collector directly via a socket. But that doesn't solve the problem of issuing the vacuum to different databases, I would still create a new backend for every database that needs a vacuum or analyze issues. Also, we don't want to launch multiple simultaneous vacuums so we want the commands to be serialized (I know some people want to be able to do this if databases are located on different disks, but for now i'm keeping things simple). I have an idea for this that I just mentioned in another message to the list. I thought about it in last days and I found perfect Tom's idea about FSM tables usage: There has been lots of discusion of incorporating FSM data into the auto_vacuum decision process. I am interested in exploring this, but since I'm already biting off more than I can easily chew, I am going to try and leave the decision making process the same for now. BTW I think we need to use both tools (stats and FSM) since not all tables will be in the FSM, an insert only table still needs to be analyzed periodically and a lightly updated table will eventually need to be vacuumed. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
On Mon, 2004-03-22 at 10:51, Tom Lane wrote: 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. Personally I like #2. The claim that this requires extra code seems bogus to me --- when you are working at the C code level, reading a table takes code too. I think both 1 and 3 represent clutter. Also, they will introduce entirely new sets of problems that we already know how to deal with in the case of config files, but have no experience with for config tables. Some examples: how do you get the daemon to notice that you changed the config (SIGHUP for config files, ?? otherwise)? Can you easily get at the config data from a daemon process regardless of which database it's attached to (if any)? If the config data is in a table, what sorts of interesting problems will arise from trying to vacuum that table itself? I guess what I mean by extra code is extra work since I am comfortable storing data in tables, and would need to do some work to put it in a config file. Some of this conversations should change if we really are going to integrate autovacuum into the backend. My suggestions for storing config data was really assuming that we would continue to be a client app for at least one more release cycle, but I think I'm hearing that people want it in the backend now. Assuming that we are being integrated into the backend, and assuming that a autovacuum process is fired off periodically by the postmaster, I think some config items should become GUC options and others will become part of the system tables. For example: enable_autovacuum should be a GUC option, but the vacuum threshold for a given table should probably just be a new column in the pg_class table. Any data stored in the system tables will be check every time the autovacuum process is run so there won't be any problems with noticing that the admin change the threshold for a given table. Basically I see config tables as cutting against the grain of everything we've done to date in Postgres DBA-level configuration, and I do not see a reason to justify that sort of reinvention. I agree if we are talking about a client app, but I think the scheme laid out above is consistent for something that is part of the backend, do you disagree? Thanks, Matthew O'Connor ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
On Sun, Mar 21, 2004 at 05:32:59PM -0500, Matthew T. O'Connor wrote: Lately I have been thinking about the next steps for the pg_autovacuum daemon. I have written up a document that describes what I'm planning to do next. Please read the attached and response as I would really like some feedback. Hi, Matthew. For our uses, we found that pg_autovacuum did not behave as expected with vacuum_threshold set to 0. For our purposes, we have a very good idea of how many tuples need to be slurped up over a given interval, and would like the autovacuum daemon to simply go through and vacuum when it hits a, er, threshold. Obligatory explanation: We were doing this with cron, but when something gets really ugly, cron is not as intuitive as autovacuum. Having this in the postmaster would be fine by me (and in fact, I would even suggest that it belongs there), but I really need to have more flexibility or we are going to (read: I will be tasked with) be writing our own. Sorry for the terseness -- and I haven't had a chance to go over the whole thread; we're moving from 7.3.2 to 7.4.2 on the big database this week. Alex -- [EMAIL PROTECTED] Alex J. Avriette, Systems Programmer You have a spell checker. It is called a brain. Please use it. - Chris Lyth ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
As such, I think that a vacuum backend for a specific database should be forked upon the first connect. Also, the backend might like to try and workout if there are any active backends for its database every so often and if not, perform a final vacuum (if necessary) and exit, so that we don't have lots of idle processes sitting around. Lots of idle processes sitting around is right out, too. Remember that each one would eat a backend connection slot. I think we are going to have to limit this to *one* process at a time. What that probably means is that we successively launch an autovacuum process against each database, it does whatever seems appropriate in that database and then quits. We could manage this just like checkpoints are presently managed --- the only thing the postmaster has to know is the desired idle period between end of one autovacuum and start of the next. *slaps hand on forehead* Yes, this is the best approach. So, do we want a static time, a GUC controlled time or some time which is modified by pg_autovacuum's/stat's collector's knowledge of the amount of work which goes on in any given database? Gavin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] pg_autovacuum next steps
Tom Lane wrote: From the point of view of the postmaster a GUC-controlled delay would seem like the best thing. We could discuss having the autovacuum code try to feed back adjustments in the delay, but remember that one of the golden virtues for the postmaster proper is simplicity; that translates directly to robustness. We don't want the postmaster engaging in anything complicated that could potentially lock it up or crash it due to a bug. Agreed. [snip...] This nice-sounding arrangement is probably not directly workable because of the fact that the postmaster has no good way to know about or control backends if they aren't its direct children. Perhaps the autovac daemon *should* use libpq, that is, not fork backends but connect via the postmaster each time it wants to run a backend. Then the backends are ordinary children of the postmaster and everything acts normally. (This could amount to using the existing autovac code, and simply adding a frammish to the postmaster to autospawn the autovac daemon as a non-backend child process.) Well this certainly sounds like something that would be easy to do, which appeals to me at least as a first cut. Question: Does this mean that I lose many of the advantages of being in the backend? That is, would pg_autovacuum still be able to use facilities such as GUC, error reporting, access to the FSM, direct pipe connection to the stats collector etc? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
Alex J. Avriette wrote: Hi, Matthew. For our uses, we found that pg_autovacuum did not behave as expected with vacuum_threshold set to 0. For our purposes, we have a very good idea of how many tuples need to be slurped up over a given interval, and would like the autovacuum daemon to simply go through and vacuum when it hits a, er, threshold. Could you please explain this better, I don't really understand what the problem is. If you want pg_autovacuum to perform a vacuum on a table that has had exactly X updates no matter what, you can just run it with -V0 -vX (where X is the vacuum threshold) same thing can be done for analyze. Are you saying that you did this and it did not work as expected? Having this in the postmaster would be fine by me (and in fact, I would even suggest that it belongs there), but I really need to have more flexibility or we are going to (read: I will be tasked with) be writing our own. Everyone agrees it should be in the backend eventually the questions were just if it was ready, and how or if it should change when it goes in. Sorry for the terseness -- and I haven't had a chance to go over the whole thread; we're moving from 7.3.2 to 7.4.2 on the big database this week. No problem, but I need more information to diagnose your problem. Matthew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor [EMAIL PROTECTED] writes: Well this certainly sounds like something that would be easy to do, which appeals to me at least as a first cut. Question: Does this mean that I lose many of the advantages of being in the backend? That is, would pg_autovacuum still be able to use facilities such as GUC, error reporting, access to the FSM, direct pipe connection to the stats collector etc? If you aren't a backend then you couldn't safely access shared memory, including FSM in particular. I see no reason you couldn't use GUC though. There is no direct pipe connection to the stats collector, except in the output direction which is not what you want, so I'm not seeing your point there. I am not sure that lack of FSM access is a showstopper, though. We could easily imagine inventing backend commands to read out whatever info you want from FSM, so you could request the info from your connected backend. The more I think about this the more I like it --- it keeps the autovac control code still at arms length from the backend which will surely ease development and experimentation. I suppose there is some overhead in pushing data back and forth over the FE/BE protocol, but surely that would be negligible next to the real work of vacuuming. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg_autovacuum next steps
Tom Lane wrote: If you aren't a backend then you couldn't safely access shared memory, including FSM in particular. I see no reason you couldn't use GUC though. There is no direct pipe connection to the stats collector, except in the output direction which is not what you want, so I'm not seeing your point there. I probably said that wrong, but how do backends get their stats data? Meaning, when I do a select * from pg_stat_all_tables how is the backend getting that data. The reason I'm interested is that if I don't have to fire up a backend just to check the stats that would reduce the load associated with the autovacuum daemon. Someone earlier in the thread seemed to imply there was a way to do this. I am not sure that lack of FSM access is a showstopper, though. We could easily imagine inventing backend commands to read out whatever info you want from FSM, so you could request the info from your connected backend. Yeah I agree, and for phase 1 we can just continue working only on stats data. The more I think about this the more I like it --- it keeps the autovac control code still at arms length from the backend which will surely ease development and experimentation. I suppose there is some overhead in pushing data back and forth over the FE/BE protocol, but surely that would be negligible next to the real work of vacuuming. Right, I think the overhead would be negligible. Since you seem to think this is (or at least might be) a good idea, I will go ahead and try to get the postmaster to fire-up the autovacuum daemon. So that the 1st cut, will basically be pg_autovacuum exactly as it stands now, just launched by the postmaster. Also, you didn't mention if I will be able to use the backend logging functions, I would guess that I can, but I'm not totally sure. Thanks again, Matthew ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor [EMAIL PROTECTED] writes: Tom Lane wrote: There is no direct pipe connection to the stats collector, I probably said that wrong, but how do backends get their stats data? They read it out of a flat file that the stats collector rewrites every so often. Meaning, when I do a select * from pg_stat_all_tables how is the backend getting that data. The reason I'm interested is that if I don't have to fire up a backend just to check the stats that would reduce the load associated with the autovacuum daemon. Is that really worth the loss of independence? I'm not sure one way or the other myself. I suppose the autovac daemon could still be run by hand for testing purposes, but it would have to run as user postgres and on the same machine as the postmaster. Right, I think the overhead would be negligible. Since you seem to think this is (or at least might be) a good idea, I will go ahead and try to get the postmaster to fire-up the autovacuum daemon. So that the 1st cut, will basically be pg_autovacuum exactly as it stands now, just launched by the postmaster. Also, you didn't mention if I will be able to use the backend logging functions, I would guess that I can, but I'm not totally sure. Yes, if you are launched by the postmaster then you'd have access to its stderr etc, so you could just log via elog/ereport. Again though this puts you in a situation where the daemon *must* be launched by the postmaster or it won't work. How much arm's-length separation are you willing to give up? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
On Mon, 22 Mar 2004, Tom Lane wrote: Matthew T. O'Connor [EMAIL PROTECTED] writes: Well this certainly sounds like something that would be easy to do, which appeals to me at least as a first cut. Question: Does this mean that I lose many of the advantages of being in the backend? That is, would pg_autovacuum still be able to use facilities such as GUC, error reporting, access to the FSM, direct pipe connection to the stats collector etc? [snip] The more I think about this the more I like it --- it keeps the autovac control code still at arms length from the backend which will surely ease development and experimentation. I suppose there is some overhead in pushing data back and forth over the FE/BE protocol, but surely that would be negligible next to the real work of vacuuming. I was initially against the idea of using libpq but its growing on me too. I think it would be good if the core functions of pg_autovacuum: threshold algorithms, connection, issuing commands can be (re?)designed such that not only the backend can link against it but also a stripped down pg_autovacuum binary which can be used manually. That way users can have a choice and a workaround if there are any issues with the backend model. Also useful for debugging. Thanks, Gavin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor wrote: * Inability to customize thresholds on a per table basis I ran headlong into this one. IMHO fixing this is critical. * Inability to set default thresholds on a per database basis * Inability to exclude specific databases / tables from pg_autovacuum monitoring These would be nice to have, but less critical than #1 I think. * Inability to schedule vacuums during off-peak times This would be *really* nice to have. In my recent case, if pg_autovacuum could work for say 3 minutes, and then back off for 2 minutes or so while the batch transactions hit, it would be ideal. I'm not sure how to address all of these concerns, or that they all should be addressed right now. One of my big questions is backend integration. I am leaning towards leaving pg_autovacuum as a client application in contrib for one more release. During this time, I can continue to tweak and improve pg_autovacuum so that we will have a very good idea what the final product should be before we make it a standard backend process. I really think pg_autovacuum ought to get folded into the backend now, for 7.5. I haven't had time yet to read the entire thread, but I saw others making the same comment. It would make some of the listed problems go away, or at least become far easier to deal with. For PostgreSQL 7.5, I plan to implement these new features: 1.Per database defaults and per table thresholds (including total exclusion) Great! 2.Persistent data 3.Single-Pass Mode (external scheduling from cron etc...) 4.Off peak scheduling Great again! 1. Per Database defaults and Per table Thresholds: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. A natural if folded into the backend. 3.Single-Pass Mode (External Scheduling): I have received requests to be able to run pg_autovacuum only on request (not as a daemon) making only one pass over all the tables (not looping indefinately). The advantage being that it will operate more like the current vacuum command except that it will only vacuum tables that need to be vacuumed. This feature could be useful as long as pg_autovacuum exists outside the backend. If pg_autovacuum gets integrated into the backend and gets automatically started as a daemon during startup, then this option will no longer make sense. It still might make sense. You could have a mode where the daemon essentially sleeps forever, until explicitly woken up by a signal. When woken, it makes one pass, and goes back to infinite sleep. Then provide a simple way to signal the autovacuum process -- maybe an extension of the current VACUUM syntax. 4.Off-Peak Scheduling: A fundamental advantage of our vacuum system is that the work required to reclaim table space is taken out of the critical path and can be moved to and off-peak time when cycles are less precious. One of the drawbacks of the current pg_autovacuum is that it doesn't have any way to factor this in. In it's simplest form (which I will implement first) I would add the ability to add a second set of thresholds that will be active only during an off-peak time that can be specified in the pg_autovacuum database, perhaps in a general_settings table. I don't know how this would work, but it is for sure important. In the recent testing I found that pg_autovacuum (well, lazy vacuum in general, but I was using pg_autovacuum to control it) made a huge difference in performance of batch transactions. They range from 4-5 seconds without vacuum running, to as high as 15 minutes with vacuum running. With the vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as high as 10 minutes. Backing vacuum off any more than that caused it to fall behind the transaction rate unrecoverably. But as I said above, if the transactions could complete without vacuum running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes between batches, all would be well. Joe ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
Joe Conway wrote: Matthew T. O'Connor wrote: * Inability to schedule vacuums during off-peak times This would be *really* nice to have. In my recent case, if pg_autovacuum could work for say 3 minutes, and then back off for 2 minutes or so while the batch transactions hit, it would be ideal. I'm not sure what you are suggesting here. As it stands right now, pg_autovacuum just issues a standard vacuum command, so there isn't anything pg_autovacuum can do until that command completes. There has been a lot of work going on trying to reduce performance impact associated with a vacuum (vacuum delay, ARC etc), hopefully that will make a big difference. I really think pg_autovacuum ought to get folded into the backend now, for 7.5. I haven't had time yet to read the entire thread, but I saw others making the same comment. It would make some of the listed problems go away, or at least become far easier to deal with. Yeah, that seems to be the consensus, I am going to work on that next. 3.Single-Pass Mode (External Scheduling): It still might make sense. You could have a mode where the daemon essentially sleeps forever, until explicitly woken up by a signal. When woken, it makes one pass, and goes back to infinite sleep. Then provide a simple way to signal the autovacuum process -- maybe an extension of the current VACUUM syntax. Well one thing we talked about was having the autovacuum process not be a daemon that is running all the time but rather a process that is spawned periodically by the postmaster, so you wouldn't have to worry about the signal / wake up issues. I could see this working much like checkpoints where the postmaster fires them off on a schedule, there is nothing stopping you from issuing a checkpoint command to force one immediately. So perhaps there could be a new SQL command like VACUUM AUTO or something like that. 4.Off-Peak Scheduling: In it's simplest form (which I will implement first) I would add the ability to add a second set of thresholds that will be active only during an off-peak time that can be specified in the pg_autovacuum database, perhaps in a general_settings table. I don't know how this would work, but it is for sure important. In the recent testing I found that pg_autovacuum (well, lazy vacuum in general, but I was using pg_autovacuum to control it) made a huge difference in performance of batch transactions. They range from 4-5 seconds without vacuum running, to as high as 15 minutes with vacuum running. With the vacuum delay patch, delay = 1, pagecount = 8, I still saw times go as high as 10 minutes. Backing vacuum off any more than that caused it to fall behind the transaction rate unrecoverably. But as I said above, if the transactions could complete without vacuum running in 4-5 seconds, then vacuuming resumes for the 3-to-4 minutes between batches, all would be well. Again, once the vacuum command is issued, it's out of pg_autovacuums control. There has been some talk of pg_autovacuum looking at the system load to see if it should wait, or passing some different delay settings to vacuum based on system activity, so maybe some of that will help. What I was talking about with Off-Peak scheduling is really just setting different thresholds for different times of the day, (or perhaps for different systems loads) so that if you know your app is typically busy from 8AM - 8PM, then you can set more conservative thresholds during that time window, and more aggressive thresholds during the Off-Peak time. Thanks for the feedback, Matthew ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] pg_autovacuum next steps
Gavin Sherry wrote: I was initially against the idea of using libpq but its growing on me too. I think it would be good if the core functions of pg_autovacuum: threshold algorithms, connection, issuing commands can be (re?)designed such that not only the backend can link against it but also a stripped down pg_autovacuum binary which can be used manually. That way users can have a choice and a workaround if there are any issues with the backend model. Also useful for debugging. I agree. Initially, it appears that this won't be a problem since the pg_autovacuum executable will be untouched (or as untouched as possible anyway), it will just be launched by the backend. Going forward if I use any of the functionality provided by the backend (error reporting etc...) I will then have to deign it to work in both stand alone and postmaster sub-process modes, which I think is doable. Matthew ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor wrote: Could you please explain this better, I don't really understand what the problem is. If you want pg_autovacuum to perform a vacuum on a table that has had exactly X updates no matter what, you can just run it with -V0 -vX (where X is the vacuum threshold) same thing can be done for analyze. Are you saying that you did this and it did not work as expected? I think the problem with that (I initially tried it myself) is that without a per-table setting, it is impractical. For instance, I want the one large table in my app vacuumed every 500,000 dead tuples, but now my smaller tables never reach the threshold. Joe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor wrote: Joe Conway wrote: This would be *really* nice to have. In my recent case, if pg_autovacuum could work for say 3 minutes, and then back off for 2 minutes or so while the batch transactions hit, it would be ideal. I'm not sure what you are suggesting here. As it stands right now, pg_autovacuum just issues a standard vacuum command, so there isn't anything pg_autovacuum can do until that command completes. There has been a lot of work going on trying to reduce performance impact associated with a vacuum (vacuum delay, ARC etc), hopefully that will make a big difference. Hopefully it will. I've not had a chance yet to test against cvs tip. I guess what I was suggesting would have to be built in to the lazy vacuum process itself, similar to the vacuum delay stuff. I have no idea if it would be practical, but ideally every X number of pages there would be a way to detect current load on the postgres cluster or current load on the server, and then either sleep, or go full bore, depending on the result and some threshold setting. Kind of like the vacuum delay patch, but with don't bother to sleep if the system is idle. Joe ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg_autovacuum next steps
On Mon, Mar 22, 2004 at 04:50:57PM -0500, Matthew T. O'Connor wrote: Could you please explain this better, I don't really understand what the problem is. If you want pg_autovacuum to perform a vacuum on a table that has had exactly X updates no matter what, you can just run it with -V0 -vX (where X is the vacuum threshold) same thing can be done for analyze. Are you saying that you did this and it did not work as expected? My experience was it did not vacuum at all. Everyone agrees it should be in the backend eventually the questions were just if it was ready, and how or if it should change when it goes in. It is very simple code, I'm not sure what is not ready about it. If people don't want to use it, they don't have to. Many databases can go days to weeks without vacuuming. alex -- [EMAIL PROTECTED] Alex J. Avriette, Unix Systems Gladiator You cannot invade the mainland United States. There would be a rifle behind each blade of grass. - Admiral Isoroku Yamamoto ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor [EMAIL PROTECTED] writes: Lately I have been thinking about the next steps for the pg_autovacuum daemon. I have written up a document that describes what I'm planning to do next. Please read the attached and response as I would really like some feedback. [ rtf document ] Please repost in some less proprietary format. Plain text is generally considered the thing to use on this list. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
, 22.03.2004, 00:12, Tom Lane : Please repost in some less proprietary format. Plain text is generally considered the thing to use on this list. -- Markus Bertheau [EMAIL PROTECTED] pg_autovacuum Version 2 Design Document: Exec Summary: pg_autovacuum was initially released as a contrib module in PostgreSQL v7.4. The version in 7.4 is by design very simple. No configuration is required, and very little configuration is possible. Despite these limitations it was voted the most popular new feature of PostgreSQL v7.4 according to the survey held on postgresql.org (http://www.postgresql.org/survey.php?View=1SurveyID=23). Despite it's popularity there is much room for improvement. This document sets out to define the most important improvements that would help pg_autovacuum to become a truly powerful asset to the suite of tools that come with PostgreSQL. Current Problems Limitations: Based on user feedback from people using pg_autovacuum in the field, and my own observations, there are a number of problems and limitation with pg_autovacuum. They are: Inability to customize thresholds on a per table basis Inability to set default thresholds on a per database basis Inability to exclude specific databases / tables from pg_autovacuum monitoring Inability to schedule vacuums during off-peak times Lack of integration related to startup and shutdown Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum (requires backend integration? or can listen / notify can be used?) Lack of logging options / syslog integration / log rotation options Create table fails because template1 is busy I'm not sure how to address all of these concerns, or that they all should be addressed right now. One of my big questions is backend integration. I am leaning towards leaving pg_autovacuum as a client application in contrib for one more release. During this time, I can continue to tweak and improve pg_autovacuum so that we will have a very good idea what the final product should be before we make it a standard backend process. For PostgreSQL 7.5, I plan to implement these new features: 1.Per database defaults and per table thresholds (including total exclusion) 2.Persistent data 3.Single-Pass Mode (external scheduling from cron etc...) 4.Off peak scheduling 1. Per Database defaults and Per table Thresholds: There are differing opinions as to the best way to providing these this feature. The primary debate is where to save the configuration data. I see three options: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. 3.Create a pg_autovacuum database inside any cluster that wants to customize their settings. Since many people do not like tools that clutter their databases by adding tables, I think option 1 (adding a pg_autovacuum table to existing databases) is right out. Using a config file would be Ok, but would require additional parsing code. My preference is option 3. Since pg_autovacuum will (hopefully) eventually become an integrated part of the backend, it will eventually be able to add required data to the system catalogs. Given these two premises, as long as pg_autovacuum remains a contrib module it could use it's own database to mimic having system tables. If this database exists, it will be used, if it does not exist, then pg_autovacuum will work just as it did in the 7.4 release with very limited options available to it. The user will be able to specify a non-default database. Table Structure for database specific defaults and table specific thresholds: databases_defaults: (will reference the pg_class system table) id serial primary key exclude_databaseboolean default_vacuum_scaling_factor float default_vacuum_base_value int default_analyze_scaling_factor float default_analyze_base_value int dboid oid references pg_database.oid table_thresholds id serial primary key exclude_table boolean (exclude this table) vacuum_scaling_factor float (equivalent to -v) vacuum_base_value int (equivalent to -V) vacuum_thresholdfloat (if 0, use this threshold) analyze_scaling_factor float (equivalent to -a) analyze_base_value int (equivalent to -A) analyze_threshold float (if 0 use this threshold) relid oid references pg_classs.relid 2.Persistent pg_autovacuum Data: Right now pg_autovacuum has no memory of what was going on the last
Re: [HACKERS] pg_autovacuum next steps
Matthew T. O'Connor wrote: Lately I have been thinking about the next steps for the pg_autovacuum daemon. I have written up a document that describes what I'm planning to do next. Please read the attached and response as I would really like some feedback. I think these configuration issues will become a lot easier if you make the autovacuum daemon a subprocess of the postmaster (like, say, the checkpoint process). Then you have access to a host of methods for storing state, handling configuration, etc. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] pg_autovacuum next steps
I think these configuration issues will become a lot easier if you make the autovacuum daemon a subprocess of the postmaster (like, say, the checkpoint process). Then you have access to a host of methods for storing state, handling configuration, etc. Yeah - why delay making it a backend process? :) Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] pg_autovacuum next steps
On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote: I think these configuration issues will become a lot easier if you make the autovacuum daemon a subprocess of the postmaster (like, say, the checkpoint process). Then you have access to a host of methods for storing state, handling configuration, etc. Yeah - why delay making it a backend process? :) Ok, well this was part of the reason to have this conversation. My reasons: A) I wasn't sure if people really thought this was ready to be integrated. Tom had said a while ago, that it was a good to keep it as a contrib module while it's still actively being developed. B) Perhaps people like the idea of it being a client app (I don't think so.) C) Most importantly, I'm not backend hacker. If someone wants to do the initial work of getting it running as a backend process, I can take it from there. A while ago, Bruce offered to help me with any backend issues I might have, so perhaps with a little help I can take a run at it. So the first question big question is: Do we want to make it a backend subprocess now? Secondly, are there any other features that people are interested in that were not mentioned in my document? Matthew O'Connor ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] pg_autovacuum next steps
On Sun, 2004-03-21 at 18:12, Tom Lane wrote: Matthew T. O'Connor [EMAIL PROTECTED] writes: [ rtf document ] Please repost in some less proprietary format. Plain text is generally considered the thing to use on this list. I don't think RTF is proprietary but I should have just posted inline anyway so here is a copy: pg_autovacuum Version 2 Design Document: Exec Summary: pg_autovacuum was initially released as a contrib module in PostgreSQL v7.4. The version in 7.4 is by design very simple. No configuration is required, and very little configuration is possible. Despite these limitations it was voted the most popular new feature of PostgreSQL v7.4 according to the survey held on postgresql.org (http://www.postgresql.org/survey.php?View=1SurveyID=23). Despite it's popularity there is much room for improvement. This document sets out to define the most important improvements that would help pg_autovacuum to become a truly powerful asset to the suite of tools that come with PostgreSQL. Current Problems Limitations: Based on user feedback from people using pg_autovacuum in the field, and my own observations, there are a number of problems and limitation with pg_autovacuum. They are: * Inability to customize thresholds on a per table basis * Inability to set default thresholds on a per database basis * Inability to exclude specific databases / tables from pg_autovacuum monitoring * Inability to schedule vacuums during off-peak times * Lack of integration related to startup and shutdown * Ignorance of VACUUM and ANALYZE operations performed outside pg_autovacuum (requires backend integration? or can listen / notify can be used?) * Lack of logging options / syslog integration / log rotation options * Create table fails because template1 is busy I'm not sure how to address all of these concerns, or that they all should be addressed right now. One of my big questions is backend integration. I am leaning towards leaving pg_autovacuum as a client application in contrib for one more release. During this time, I can continue to tweak and improve pg_autovacuum so that we will have a very good idea what the final product should be before we make it a standard backend process. For PostgreSQL 7.5, I plan to implement these new features: 1.Per database defaults and per table thresholds (including total exclusion) 2.Persistent data 3.Single-Pass Mode (external scheduling from cron etc...) 4.Off peak scheduling 1. Per Database defaults and Per table Thresholds: There are differing opinions as to the best way to providing these this feature. The primary debate is where to save the configuration data. I see three options: 1.Store config data inside a special pg_autovacuum table inside existing databases that wants custom settings. 2.Use a config file. This would require some additional coding to add the required parsing, but is possible. 3.Create a pg_autovacuum database inside any cluster that wants to customize their settings. Since many people do not like tools that clutter their databases by adding tables, I think option 1 (adding a pg_autovacuum table to existing databases) is right out. Using a config file would be Ok, but would require additional parsing code. My preference is option 3. Since pg_autovacuum will (hopefully) eventually become an integrated part of the backend, it will eventually be able to add required data to the system catalogs. Given these two premises, as long as pg_autovacuum remains a contrib module it could use it's own database to mimic having system tables. If this database exists, it will be used, if it does not exist, then pg_autovacuum will work just as it did in the 7.4 release with very limited options available to it. The user will be able to specify a non-default database. Table Structure for database specific defaults and table specific thresholds: databases_defaults: (will reference the pg_class system table) id serial primary key exclude_databaseboolean default_vacuum_scaling_factor float default_vacuum_base_value int default_analyze_scaling_factor float default_analyze_base_value int dboid oid references pg_database.oid table_thresholds id serial primary key exclude_table boolean (exclude this table) vacuum_scaling_factor float (equivalent to -v) vacuum_base_value int (equivalent to -V) vacuum_thresholdfloat (if 0, use this threshold) analyze_scaling_factor float (equivalent to -a) analyze_base_value int (equivalent to -A) analyze_threshold float (if 0 use this threshold) relid oid references pg_classs.relid 2.Persistent pg_autovacuum Data: Right now pg_autovacuum has no memory of what
Re: [HACKERS] pg_autovacuum next steps
On Sun, 21 Mar 2004, Matthew T. O'Connor wrote: On Sun, 2004-03-21 at 20:31, Christopher Kings-Lynne wrote: I think these configuration issues will become a lot easier if you make the autovacuum daemon a subprocess of the postmaster (like, say, the checkpoint process). Then you have access to a host of methods for storing state, handling configuration, etc. Yeah - why delay making it a backend process? :) Ok, well this was part of the reason to have this conversation. My reasons: A) I wasn't sure if people really thought this was ready to be integrated. Tom had said a while ago, that it was a good to keep it as a contrib module while it's still actively being developed. I was talking to Jan about some other work on VACUUM related to more intelligent vacuuming. Namely, maintaining a map (outside of shared memory) of blocks which have been pushed out of the free space map for VACUUM to visit (which requires a backend process) and being aware of load restrictions (ie, allowing user to only vacuum when the load average is less than X, for example) and some other leveling stuff to ensure that availability is consistent. Whilst this doesn't related to pg_autovacuum specifically, it'd be great if they could be released at the same time, I think. B) Perhaps people like the idea of it being a client app (I don't think so.) I'd like to see it as part of the backend. C) Most importantly, I'm not backend hacker. If someone wants to do the initial work of getting it running as a backend process, I can take it from there. A while ago, Bruce offered to help me with any backend issues I might have, so perhaps with a little help I can take a run at it. I'd be happy to help you out. Gavin ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] pg_autovacuum next steps
B) Perhaps people like the idea of it being a client app (I don't think so.) I'd like to see it as part of the backend. C) Most importantly, I'm not backend hacker. If someone wants to do the initial work of getting it running as a backend process, I can take it from there. A while ago, Bruce offered to help me with any backend issues I might have, so perhaps with a little help I can take a run at it. I'd be happy to help you out. Agreed. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match